[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)
thesamesam wrote: > I put union TBAA under the option that is disabled by default. I am also > considering to put pointer TBAA under option too, if it raises enough > concerns too. Let's please file a bug once this is merged so we remember to come back to that. https://github.com/llvm/llvm-project/pull/75177 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)
@@ -1,7 +1,7 @@ -// RUN: %clang_cc1 -triple x86_64-linux -std=c++98 %s -O3 -disable-llvm-passes -pedantic-errors -emit-llvm -o - | FileCheck %s -// RUN: %clang_cc1 -triple x86_64-linux -std=c++11 %s -O3 -disable-llvm-passes -pedantic-errors -emit-llvm -o - | FileCheck %s -// RUN: %clang_cc1 -triple x86_64-linux -std=c++14 %s -O3 -disable-llvm-passes -pedantic-errors -emit-llvm -o - | FileCheck %s -// RUN: %clang_cc1 -triple x86_64-linux -std=c++1z %s -O3 -disable-llvm-passes -pedantic-errors -emit-llvm -o - | FileCheck %s +// RUN: %clang_cc1 -triple x86_64-linux -std=c++98 %s -O3 -pedantic-errors -emit-llvm -o - | FileCheck %s +// RUN: %clang_cc1 -triple x86_64-linux -std=c++11 %s -O3 -pedantic-errors -emit-llvm -o - | FileCheck %s +// RUN: %clang_cc1 -triple x86_64-linux -std=c++14 %s -O3 -pedantic-errors -emit-llvm -o - | FileCheck %s +// RUN: %clang_cc1 -triple x86_64-linux -std=c++1z %s -O3 -pedantic-errors -emit-llvm -o - | FileCheck %s // dr158: yes Endilll wrote: I'm sorry I missed that you're not writing a new test. https://github.com/llvm/llvm-project/pull/75177 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)
@@ -105,13 +105,34 @@ static bool isValidBaseType(QualType QTy) { if (RD->hasFlexibleArrayMember()) return false; // RD can be struct, union, class, interface or enum. -// For now, we only handle struct and class. -if (RD->isStruct() || RD->isClass()) +if (RD->isStruct() || RD->isClass() || +(RD->isUnion() && CodeGenOpts.UnionTBAA)) return true; } return false; } +// Give unique tag for compatible types. +std::string CodeGenTBAA::getPointeeName(const Type *Ty) { dybv-sc wrote: Changed function to append to raw_ostream. `Ty` here is coming from `getPointeeType()` so it may be not canonical, right? https://github.com/llvm/llvm-project/pull/75177 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)
@@ -1,7 +1,7 @@ -// RUN: %clang_cc1 -triple x86_64-linux -std=c++98 %s -O3 -disable-llvm-passes -pedantic-errors -emit-llvm -o - | FileCheck %s -// RUN: %clang_cc1 -triple x86_64-linux -std=c++11 %s -O3 -disable-llvm-passes -pedantic-errors -emit-llvm -o - | FileCheck %s -// RUN: %clang_cc1 -triple x86_64-linux -std=c++14 %s -O3 -disable-llvm-passes -pedantic-errors -emit-llvm -o - | FileCheck %s -// RUN: %clang_cc1 -triple x86_64-linux -std=c++1z %s -O3 -disable-llvm-passes -pedantic-errors -emit-llvm -o - | FileCheck %s +// RUN: %clang_cc1 -triple x86_64-linux -std=c++98 %s -O3 -pedantic-errors -emit-llvm -o - | FileCheck %s +// RUN: %clang_cc1 -triple x86_64-linux -std=c++11 %s -O3 -pedantic-errors -emit-llvm -o - | FileCheck %s +// RUN: %clang_cc1 -triple x86_64-linux -std=c++14 %s -O3 -pedantic-errors -emit-llvm -o - | FileCheck %s +// RUN: %clang_cc1 -triple x86_64-linux -std=c++1z %s -O3 -pedantic-errors -emit-llvm -o - | FileCheck %s // dr158: yes dybv-sc wrote: As I see it is already mentioned: https://github.com/llvm/llvm-project/blob/114e6d7ba02f090117f2cb1ffeb9027cf80f335b/clang/test/CXX/drs/dr1xx.cpp#L792C1-L792C25 https://github.com/llvm/llvm-project/pull/75177 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)
@@ -4598,8 +4602,7 @@ LValue CodeGenFunction::EmitLValueForField(LValue base, if (base.getTBAAInfo().isMayAlias() || rec->hasAttr() || FieldType->isVectorType()) { FieldTBAAInfo = TBAAAccessInfo::getMayAliasInfo(); - } else if (rec->isUnion()) { -// TODO: Support TBAA for unions. + } else if (rec->isUnion() && !CGM.getCodeGenOpts().UnionTBAA) { rjmccall wrote: Well, I understand your intent and was asking whether you've put any effort in analyzing whether this rule actually matches the standard's aliasing rules. For what it's worth, I think it does, at least as currently written; IIRC, the committee has considered various weaker rules that permit union members to alias. But it's also true that this is going to make us enforce a stricter rule than we have been. This patch is both implementing the stricter rules and enabling them by default, right? I think we should probably stage those separately, i.e. land this as an opt-in feature and then try turning it on by default in a follow-up commit. That way, if we see miscompiles from this, we can (temporarily) revert the small commit changing the default while we're analyzing that and not have to revert the whole implementation. https://github.com/llvm/llvm-project/pull/75177 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)
@@ -105,13 +105,34 @@ static bool isValidBaseType(QualType QTy) { if (RD->hasFlexibleArrayMember()) return false; // RD can be struct, union, class, interface or enum. -// For now, we only handle struct and class. -if (RD->isStruct() || RD->isClass()) +if (RD->isStruct() || RD->isClass() || +(RD->isUnion() && CodeGenOpts.UnionTBAA)) return true; } return false; } +// Give unique tag for compatible types. +std::string CodeGenTBAA::getPointeeName(const Type *Ty) { rjmccall wrote: Since this is always being used to build a larger string, could you make this append to a `raw_ostream`? I guess `Ty` *is* guaranteed to be a canonical type here, because that's apparently a precondition of `getTypeInfoHelper`. Could you assert that? https://github.com/llvm/llvm-project/pull/75177 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)
@@ -184,13 +199,24 @@ llvm::MDNode *CodeGenTBAA::getTypeInfoHelper(const Type *Ty) { return getChar(); // Handle pointers and references. - // TODO: Implement C++'s type "similarity" and consider dis-"similar" - // pointers distinct. - if (Ty->isPointerType() || Ty->isReferenceType()) -return createScalarTypeNode("any pointer", getChar(), Size); + // Pointer types never alias if their pointee type is distinct. + if ((Ty->isPointerType() || Ty->isReferenceType())) { +llvm::MDNode *AnyPtr = createScalarTypeNode("any pointer", getChar(), Size); +if (!CodeGenOpts.PointerTBAA) + return AnyPtr; +unsigned PtrDepth = 0; +do { + PtrDepth++; + Ty = Ty->getPointeeType().getTypePtr(); +} while (!Ty->getPointeeType().isNull()); rjmccall wrote: You're right, you do need to look through member pointers here, although this means you're also treating member-pointer structure as if it were just pointer structure. I guess that's okay for now. https://github.com/llvm/llvm-project/pull/75177 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)
dybv-sc wrote: > (As usual, please make any LLVM changes separately from Clang changes, > especially if they affect IR design.) I splitted commit into 2 parts for llvm and clang. Should I make separate PRs for them as well? https://github.com/llvm/llvm-project/pull/75177 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)
@@ -105,13 +105,28 @@ static bool isValidBaseType(QualType QTy) { if (RD->hasFlexibleArrayMember()) return false; // RD can be struct, union, class, interface or enum. -// For now, we only handle struct and class. -if (RD->isStruct() || RD->isClass()) +if (RD->isStruct() || RD->isClass() || +(RD->isUnion() && CodeGenOpts.UnionTBAA)) return true; } return false; } +std::string CodeGenTBAA::getPointeeName(const Type *Ty) { + if (isa(Ty)) { +llvm::MDNode *ScalarMD = getTypeInfoHelper(Ty); +auto = ScalarMD->getOperand(CodeGenOpts.NewStructPathTBAA ? 2 : 0); +assert(isa(Op) && "Expected MDString operand"); +return cast(Op)->getString().str(); + } + + if (Ty->isIncompleteType()) +return ""; dybv-sc wrote: Removed. I was uncertain in a moment what to do in that case, but after revisiting C/C++ standard I learned that there is no difference between complete and incomplete types when considering their similarity/compatibility. https://github.com/llvm/llvm-project/pull/75177 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)
dybv-sc wrote: @rjmccall , I updated part of commit that handles pointers. Added comment with C/C++ standard references to explain my decisions. Could you please review it again? https://github.com/llvm/llvm-project/pull/75177 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)
rjmccall wrote: > From all of above, the only way void* and char* alias if they are compatible, > but they are not: Ah, you're right, I had misremembered this. 6.2.5p31 requires them to have the same representation as each other, and there's a footnote about this being meant to imply "interchangeability as arguments to functions, return values from functions, and members of unions", but it doesn't go so far as to make them formally compatible. And even if we wanted to read that as implying a sort of semi-compatibility that ought to affect aliasing, the same paragraph goes on to say that all pointers to structs must have the same representation, and presumably we wouldn't want *that* to affect aliasing. Alright, I withdraw that comment; it's not formally required. We may need to be more conservative about the aliasing of `void*` l-values on practical grounds, but we can let that be informed by results on the ground. https://github.com/llvm/llvm-project/pull/75177 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)
@@ -4598,8 +4602,7 @@ LValue CodeGenFunction::EmitLValueForField(LValue base, if (base.getTBAAInfo().isMayAlias() || rec->hasAttr() || FieldType->isVectorType()) { FieldTBAAInfo = TBAAAccessInfo::getMayAliasInfo(); - } else if (rec->isUnion()) { -// TODO: Support TBAA for unions. + } else if (rec->isUnion() && !CGM.getCodeGenOpts().UnionTBAA) { dybv-sc wrote: The goal of that patch is to re-utilize existing struct path TBAA metadata and amend it functionality to work for union types. Here I allow for that metadata to be generated for union members accesses, so yes there will be tbaa struct records with all members with offset 0. Later in code I adapted current struct-path-tbaa tree walkers to handle such situations where multiple fields have same offset. https://github.com/llvm/llvm-project/pull/75177 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)
@@ -216,6 +216,9 @@ ENUM_CODEGENOPT(StructReturnConvention, StructReturnConventionKind, 2, SRCK_Defa CODEGENOPT(RelaxAll , 1, 0) ///< Relax all machine code instructions. CODEGENOPT(RelaxedAliasing , 1, 0) ///< Set when -fno-strict-aliasing is enabled. CODEGENOPT(StructPathTBAA, 1, 0) ///< Whether or not to use struct-path TBAA. +CODEGENOPT(UnionTBAA, 1, 0) ///< Whether or not to use struct-path TBAA on unions. +CODEGENOPT(PointerTBAA, 1, 0) ///< Whether or not to generate TBAA on pointers. +CODEGENOPT(ArrayTBAA, 1, 0) ///< Whether or not to generate TBAA on arrays. rjmccall wrote: Please match the style of surrounding lines, in which the commas are aligned. https://github.com/llvm/llvm-project/pull/75177 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)
@@ -184,13 +199,24 @@ llvm::MDNode *CodeGenTBAA::getTypeInfoHelper(const Type *Ty) { return getChar(); // Handle pointers and references. - // TODO: Implement C++'s type "similarity" and consider dis-"similar" - // pointers distinct. - if (Ty->isPointerType() || Ty->isReferenceType()) -return createScalarTypeNode("any pointer", getChar(), Size); + // Pointer types never alias if their pointee type is distinct. + if ((Ty->isPointerType() || Ty->isReferenceType())) { +llvm::MDNode *AnyPtr = createScalarTypeNode("any pointer", getChar(), Size); +if (!CodeGenOpts.PointerTBAA) + return AnyPtr; +unsigned PtrDepth = 0; +do { + PtrDepth++; + Ty = Ty->getPointeeType().getTypePtr(); +} while (!Ty->getPointeeType().isNull()); rjmccall wrote: `getPointeeType()` will look through a lot of types that you probably don't want to look through, including member pointers. You should write this to specifically look for pointers. (References can't occur in nested positions.) https://github.com/llvm/llvm-project/pull/75177 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)
@@ -105,13 +105,28 @@ static bool isValidBaseType(QualType QTy) { if (RD->hasFlexibleArrayMember()) return false; // RD can be struct, union, class, interface or enum. -// For now, we only handle struct and class. -if (RD->isStruct() || RD->isClass()) +if (RD->isStruct() || RD->isClass() || +(RD->isUnion() && CodeGenOpts.UnionTBAA)) return true; } return false; } +std::string CodeGenTBAA::getPointeeName(const Type *Ty) { + if (isa(Ty)) { +llvm::MDNode *ScalarMD = getTypeInfoHelper(Ty); +auto = ScalarMD->getOperand(CodeGenOpts.NewStructPathTBAA ? 2 : 0); +assert(isa(Op) && "Expected MDString operand"); +return cast(Op)->getString().str(); + } + + if (Ty->isIncompleteType()) +return ""; rjmccall wrote: Why are incomplete types treated differently here? https://github.com/llvm/llvm-project/pull/75177 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)
@@ -184,13 +199,24 @@ llvm::MDNode *CodeGenTBAA::getTypeInfoHelper(const Type *Ty) { return getChar(); // Handle pointers and references. - // TODO: Implement C++'s type "similarity" and consider dis-"similar" - // pointers distinct. - if (Ty->isPointerType() || Ty->isReferenceType()) -return createScalarTypeNode("any pointer", getChar(), Size); + // Pointer types never alias if their pointee type is distinct. rjmccall wrote: This comment both is incorrect and doesn't match the code. What you seem to be implementing here is the C++ similar-type rule, in which what matters is whether the pointer type sub-structure matches while ignoring qualifiers. Unfortunately, this rule seems to treat `void*` and `char*` as different types. That is wrong in C because those types are compatible, and we probably ought to use the C rule even in C++. Also, in general, I would suggest that you write `getPointeeName` much more conservatively rather than assuming that you can just render an arbitrary type and it's going to be okay to treat different renderings as distinct for TBAA purposes. For example, this is going to treat pointers to different vector types as non-aliasing, which worries me a lot because vector programmers are often pretty fast-and-loose. You're also going to be stricter about ObjC than I'm comfortable with. I would strongly suggest just doing this rendering for specific kinds of types, like records, and otherwise having some kind of fallback to the any-pointer metadata. https://github.com/llvm/llvm-project/pull/75177 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)
@@ -4080,7 +4080,11 @@ LValue CodeGenFunction::EmitArraySubscriptExpr(const ArraySubscriptExpr *E, E->getType(), !getLangOpts().isSignedOverflowDefined(), SignedIndices, E->getExprLoc(), , E->getBase()); EltBaseInfo = ArrayLV.getBaseInfo(); -EltTBAAInfo = CGM.getTBAAInfoForSubobject(ArrayLV, E->getType()); +// If array is member of some aggregate, keep struct path TBAA information +// about it. +EltTBAAInfo = isa(Array) && CGM.getCodeGenOpts().ArrayTBAA + ? ArrayLV.getTBAAInfo() + : CGM.getTBAAInfoForSubobject(ArrayLV, E->getType()); rjmccall wrote: Hmm. Okay, so if I understand correctly, the basic idea here is that TBAA for array types is just TBAA for the underlying element types, so if we have TBAA for an array l-value, whether it's struct-path TBAA or not, subscripting into the array can just preserve that TBAA onto the element. And then that gets complicated by the fact that we apparently actually use *char* as the TBAA for array types unless we're doing struct-path TBAA, so it's quite important that we actually override that or else we basically lose TBAA completely for these subscripts. At the very least, this needs to be reflected in the comment; the overall situation is very non-obvious locally. But I would actually prefer that we just unconditionally change the TBAA we use for array types, because it seems unjustifiable. And as far as I can see, that should be an NFC refactor because it's not actually possible to do accesses directly of array type: arrays just decay into pointers in every context that would otherwise cause an access, and that decay ends up changing the TBAA we'd use anyway. That is, I think you should consider just doing the array part of this patch unconditionally. The union and pointer changes are real increases in precision / risk, though, and should continue to be guarded with flags. https://github.com/llvm/llvm-project/pull/75177 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)
@@ -4598,8 +4602,7 @@ LValue CodeGenFunction::EmitLValueForField(LValue base, if (base.getTBAAInfo().isMayAlias() || rec->hasAttr() || FieldType->isVectorType()) { FieldTBAAInfo = TBAAAccessInfo::getMayAliasInfo(); - } else if (rec->isUnion()) { -// TODO: Support TBAA for unions. + } else if (rec->isUnion() && !CGM.getCodeGenOpts().UnionTBAA) { rjmccall wrote: You've checked that this is the right rule for unions? We can just record the union as a containing aggregate that happens to have all the different union members at offset 0? https://github.com/llvm/llvm-project/pull/75177 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits