[llvm] [clang] [Clang] Emit TBAA info for enums in C (PR #73326)
@@ -196,6 +196,9 @@ C Language Changes number of elements in the flexible array member. This information can improve the results of the array bound sanitizer and the ``__builtin_dynamic_object_size`` builtin. +- Enums will now be represented in TBAA metadata using their actual underlying david-arm wrote: The comment is actually under the heading `C Language Changes` so I think that should be clear enough. Is that ok? https://github.com/llvm/llvm-project/pull/73326 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [Clang] Emit TBAA info for enums in C (PR #73326)
@@ -6376,13 +6376,26 @@ aliases a memory access with an access tag ``(BaseTy2, AccessTy2, Offset2)`` if either ``(BaseTy1, Offset1)`` is reachable from ``(Base2, Offset2)`` via the ``Parent`` relation or vice versa. +In C an enum will be compatible with an underlying integer type that is large +enough to hold all enumerated values. In most cases this will be an ``int``, +which is the default when no type is specified. However, if an ``int`` is not paulwalker-arm wrote: I don't think this information belongs in the LangRef, especially as it doesn't add any value to the underlying description of what the metadata represents. https://github.com/llvm/llvm-project/pull/73326 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [Clang] Emit TBAA info for enums in C (PR #73326)
@@ -196,6 +196,9 @@ C Language Changes number of elements in the flexible array member. This information can improve the results of the array bound sanitizer and the ``__builtin_dynamic_object_size`` builtin. +- Enums will now be represented in TBAA metadata using their actual underlying paulwalker-arm wrote: Perhaps worth saying `Enums in C...` since the behaviour for C++ is different and remains unchanged. https://github.com/llvm/llvm-project/pull/73326 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [Clang] Emit TBAA info for enums in C (PR #73326)
https://github.com/paulwalker-arm approved this pull request. This looks broadly good to me. I suggest reverting the LangRef change because it doesn't add any new information relevant to LLVM IR. https://github.com/llvm/llvm-project/pull/73326 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [Clang] Emit TBAA info for enums in C (PR #73326)
https://github.com/david-arm updated https://github.com/llvm/llvm-project/pull/73326 >From af76f6b6b3469fd0f5f24427c5a175c8d9d7c83a Mon Sep 17 00:00:00 2001 From: David Sherwood Date: Fri, 24 Nov 2023 13:20:23 + Subject: [PATCH 1/4] [Clang] Emit TBAA info for enums in C When emitting TBAA information for enums in C code we currently just treat the data as an 'omnipotent char'. However, with C strict aliasing this means we fail to optimise certain cases. For example, in the SPEC2017 xz benchmark there are structs that contain arrays of enums, and clang pessmistically assumes that accesses to those enums could alias with other struct members that have a different type. According to https://en.cppreference.com/w/c/language/enum enums should be treated as 'int' types unless explicitly specified (C23) or if 'int' would not be large enough to hold all the enumerated values. In the latter case the compiler is free to choose a suitable integer that would hold all such values. When compiling C code this patch generates TBAA information for the enum by using an equivalent integer of the size clang has already chosen for the enum. I have ignored C++ for now because the rules are more complex. New test added here: clang/test/CodeGen/tbaa.c --- clang/lib/CodeGen/CodeGenTBAA.cpp | 5 +- clang/test/CodeGen/tbaa.c | 116 ++ 2 files changed, 120 insertions(+), 1 deletion(-) create mode 100644 clang/test/CodeGen/tbaa.c diff --git a/clang/lib/CodeGen/CodeGenTBAA.cpp b/clang/lib/CodeGen/CodeGenTBAA.cpp index 8705d3d65f1a5..f59d3d422d520 100644 --- a/clang/lib/CodeGen/CodeGenTBAA.cpp +++ b/clang/lib/CodeGen/CodeGenTBAA.cpp @@ -196,11 +196,14 @@ llvm::MDNode *CodeGenTBAA::getTypeInfoHelper(const Type *Ty) { // Enum types are distinct types. In C++ they have "underlying types", // however they aren't related for TBAA. if (const EnumType *ETy = dyn_cast(Ty)) { +if (!Features.CPlusPlus) + return getTypeInfo(Context.getIntTypeForBitwidth(Size * 8, 0)); + // In C++ mode, types have linkage, so we can rely on the ODR and // on their mangled names, if they're external. // TODO: Is there a way to get a program-wide unique name for a // decl with local linkage or no linkage? -if (!Features.CPlusPlus || !ETy->getDecl()->isExternallyVisible()) +if (!ETy->getDecl()->isExternallyVisible()) return getChar(); SmallString<256> OutName; diff --git a/clang/test/CodeGen/tbaa.c b/clang/test/CodeGen/tbaa.c new file mode 100644 index 0..0ab81f60a7194 --- /dev/null +++ b/clang/test/CodeGen/tbaa.c @@ -0,0 +1,116 @@ +// RUN: %clang_cc1 -triple x86_64-apple-darwin -O1 -no-struct-path-tbaa -disable-llvm-passes %s -emit-llvm -o - | FileCheck %s +// RUN: %clang_cc1 -triple x86_64-apple-darwin -O1 -disable-llvm-passes %s -emit-llvm -o - | FileCheck %s -check-prefixes=PATH +// RUN: %clang_cc1 -triple x86_64-apple-darwin -O0 -disable-llvm-passes %s -emit-llvm -o - | FileCheck %s -check-prefix=NO-TBAA +// RUN: %clang_cc1 -triple x86_64-apple-darwin -O1 -relaxed-aliasing -disable-llvm-passes %s -emit-llvm -o - | FileCheck %s -check-prefix=NO-TBAA +// Test TBAA metadata generated by front-end. +// +// NO-TBAA-NOT: !tbaa + +typedef unsigned char uint8_t; +typedef unsigned short uint16_t; +typedef unsigned int uint32_t; +typedef unsigned long long uint64_t; + +typedef enum { + RED_AUTO_32, + GREEN_AUTO_32, + BLUE_AUTO_32 +} EnumAuto32; + +typedef enum { + RED_AUTO_64, + GREEN_AUTO_64, + BLUE_AUTO_64 = 0x1ull +} EnumAuto64; + +typedef enum : uint16_t { + RED_16, + GREEN_16, + BLUE_16 +} Enum16; + +typedef enum : uint8_t { + RED_8, + GREEN_8, + BLUE_8 +} Enum8; + +uint32_t g0(EnumAuto32 *E, uint32_t *val) { +// CHECK-LABEL: define{{.*}} i32 @g0( +// CHECK: store i32 5, ptr %{{.*}}, align 4, !tbaa [[TAG_i32:!.*]] +// CHECK: store i32 0, ptr %{{.*}}, align 4, !tbaa [[TAG_i32]] +// CHECK: load i32, ptr %{{.*}}, align 4, !tbaa [[TAG_i32]] +// PATH-LABEL: define{{.*}} i32 @g0( +// PATH: store i32 5, ptr %{{.*}}, align 4, !tbaa [[TAG_i32:!.*]] +// PATH: store i32 0, ptr %{{.*}}, align 4, !tbaa [[TAG_i32]] +// PATH: load i32, ptr %{{.*}}, align 4, !tbaa [[TAG_i32]] + *val = 5; + *E = RED_AUTO_32; + return *val; +} + +uint64_t g1(EnumAuto64 *E, uint64_t *val) { +// CHECK-LABEL: define{{.*}} i64 @g1( +// CHECK: store i64 5, ptr %{{.*}}, align 8, !tbaa [[TAG_i64:!.*]] +// CHECK: store i64 0, ptr %{{.*}}, align 8, !tbaa [[TAG_long:!.*]] +// CHECK: load i64, ptr %{{.*}}, align 8, !tbaa [[TAG_i64]] +// PATH-LABEL: define{{.*}} i64 @g1( +// PATH: store i64 5, ptr %{{.*}}, align 8, !tbaa [[TAG_i64:!.*]] +// PATH: store i64 0, ptr %{{.*}}, align 8, !tbaa [[TAG_long:!.*]] +// PATH: load i64, ptr %{{.*}}, align 8, !tbaa [[TAG_i64]] + *val = 5; + *E = RED_AUTO_64; + return *val; +} + +uint16_t g2(Enum16 *E, uint16_t *val) { +// CHECK-LABEL: define{{.*}} i16 @g2( +// CHECK: store i16 5, ptr %{{.*}}, align 2, !tbaa
[llvm] [clang] [Clang] Emit TBAA info for enums in C (PR #73326)
momchil-velikov wrote: I thought the suggestion was to add a few lines to https://github.com/llvm/llvm-project/blob/main/clang/docs/ReleaseNotes.rst https://github.com/llvm/llvm-project/pull/73326 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [Clang] Emit TBAA info for enums in C (PR #73326)
david-arm wrote: > Do you think it's worth adding something to the Clang release note? Done. Hope the documentation I added makes sense! https://github.com/llvm/llvm-project/pull/73326 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [Clang] Emit TBAA info for enums in C (PR #73326)
https://github.com/david-arm updated https://github.com/llvm/llvm-project/pull/73326 >From af76f6b6b3469fd0f5f24427c5a175c8d9d7c83a Mon Sep 17 00:00:00 2001 From: David Sherwood Date: Fri, 24 Nov 2023 13:20:23 + Subject: [PATCH 1/3] [Clang] Emit TBAA info for enums in C When emitting TBAA information for enums in C code we currently just treat the data as an 'omnipotent char'. However, with C strict aliasing this means we fail to optimise certain cases. For example, in the SPEC2017 xz benchmark there are structs that contain arrays of enums, and clang pessmistically assumes that accesses to those enums could alias with other struct members that have a different type. According to https://en.cppreference.com/w/c/language/enum enums should be treated as 'int' types unless explicitly specified (C23) or if 'int' would not be large enough to hold all the enumerated values. In the latter case the compiler is free to choose a suitable integer that would hold all such values. When compiling C code this patch generates TBAA information for the enum by using an equivalent integer of the size clang has already chosen for the enum. I have ignored C++ for now because the rules are more complex. New test added here: clang/test/CodeGen/tbaa.c --- clang/lib/CodeGen/CodeGenTBAA.cpp | 5 +- clang/test/CodeGen/tbaa.c | 116 ++ 2 files changed, 120 insertions(+), 1 deletion(-) create mode 100644 clang/test/CodeGen/tbaa.c diff --git a/clang/lib/CodeGen/CodeGenTBAA.cpp b/clang/lib/CodeGen/CodeGenTBAA.cpp index 8705d3d65f1a5..f59d3d422d520 100644 --- a/clang/lib/CodeGen/CodeGenTBAA.cpp +++ b/clang/lib/CodeGen/CodeGenTBAA.cpp @@ -196,11 +196,14 @@ llvm::MDNode *CodeGenTBAA::getTypeInfoHelper(const Type *Ty) { // Enum types are distinct types. In C++ they have "underlying types", // however they aren't related for TBAA. if (const EnumType *ETy = dyn_cast(Ty)) { +if (!Features.CPlusPlus) + return getTypeInfo(Context.getIntTypeForBitwidth(Size * 8, 0)); + // In C++ mode, types have linkage, so we can rely on the ODR and // on their mangled names, if they're external. // TODO: Is there a way to get a program-wide unique name for a // decl with local linkage or no linkage? -if (!Features.CPlusPlus || !ETy->getDecl()->isExternallyVisible()) +if (!ETy->getDecl()->isExternallyVisible()) return getChar(); SmallString<256> OutName; diff --git a/clang/test/CodeGen/tbaa.c b/clang/test/CodeGen/tbaa.c new file mode 100644 index 0..0ab81f60a7194 --- /dev/null +++ b/clang/test/CodeGen/tbaa.c @@ -0,0 +1,116 @@ +// RUN: %clang_cc1 -triple x86_64-apple-darwin -O1 -no-struct-path-tbaa -disable-llvm-passes %s -emit-llvm -o - | FileCheck %s +// RUN: %clang_cc1 -triple x86_64-apple-darwin -O1 -disable-llvm-passes %s -emit-llvm -o - | FileCheck %s -check-prefixes=PATH +// RUN: %clang_cc1 -triple x86_64-apple-darwin -O0 -disable-llvm-passes %s -emit-llvm -o - | FileCheck %s -check-prefix=NO-TBAA +// RUN: %clang_cc1 -triple x86_64-apple-darwin -O1 -relaxed-aliasing -disable-llvm-passes %s -emit-llvm -o - | FileCheck %s -check-prefix=NO-TBAA +// Test TBAA metadata generated by front-end. +// +// NO-TBAA-NOT: !tbaa + +typedef unsigned char uint8_t; +typedef unsigned short uint16_t; +typedef unsigned int uint32_t; +typedef unsigned long long uint64_t; + +typedef enum { + RED_AUTO_32, + GREEN_AUTO_32, + BLUE_AUTO_32 +} EnumAuto32; + +typedef enum { + RED_AUTO_64, + GREEN_AUTO_64, + BLUE_AUTO_64 = 0x1ull +} EnumAuto64; + +typedef enum : uint16_t { + RED_16, + GREEN_16, + BLUE_16 +} Enum16; + +typedef enum : uint8_t { + RED_8, + GREEN_8, + BLUE_8 +} Enum8; + +uint32_t g0(EnumAuto32 *E, uint32_t *val) { +// CHECK-LABEL: define{{.*}} i32 @g0( +// CHECK: store i32 5, ptr %{{.*}}, align 4, !tbaa [[TAG_i32:!.*]] +// CHECK: store i32 0, ptr %{{.*}}, align 4, !tbaa [[TAG_i32]] +// CHECK: load i32, ptr %{{.*}}, align 4, !tbaa [[TAG_i32]] +// PATH-LABEL: define{{.*}} i32 @g0( +// PATH: store i32 5, ptr %{{.*}}, align 4, !tbaa [[TAG_i32:!.*]] +// PATH: store i32 0, ptr %{{.*}}, align 4, !tbaa [[TAG_i32]] +// PATH: load i32, ptr %{{.*}}, align 4, !tbaa [[TAG_i32]] + *val = 5; + *E = RED_AUTO_32; + return *val; +} + +uint64_t g1(EnumAuto64 *E, uint64_t *val) { +// CHECK-LABEL: define{{.*}} i64 @g1( +// CHECK: store i64 5, ptr %{{.*}}, align 8, !tbaa [[TAG_i64:!.*]] +// CHECK: store i64 0, ptr %{{.*}}, align 8, !tbaa [[TAG_long:!.*]] +// CHECK: load i64, ptr %{{.*}}, align 8, !tbaa [[TAG_i64]] +// PATH-LABEL: define{{.*}} i64 @g1( +// PATH: store i64 5, ptr %{{.*}}, align 8, !tbaa [[TAG_i64:!.*]] +// PATH: store i64 0, ptr %{{.*}}, align 8, !tbaa [[TAG_long:!.*]] +// PATH: load i64, ptr %{{.*}}, align 8, !tbaa [[TAG_i64]] + *val = 5; + *E = RED_AUTO_64; + return *val; +} + +uint16_t g2(Enum16 *E, uint16_t *val) { +// CHECK-LABEL: define{{.*}} i16 @g2( +// CHECK: store i16 5, ptr %{{.*}}, align 2, !tbaa