[llvm-branch-commits] [clang] release/21.x: [Flang] Search flang_rt in clang_rt path (#151954) (PR #152458)
https://github.com/kkwli approved this pull request. LG. Thanks. https://github.com/llvm/llvm-project/pull/152458 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [HLSL] Global resource arrays element access (PR #152454)
https://github.com/hekota edited https://github.com/llvm/llvm-project/pull/152454 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [HLSL] Global resource arrays element access (PR #152454)
https://github.com/hekota edited https://github.com/llvm/llvm-project/pull/152454 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [HLSL] Global resource arrays element access (PR #152454)
llvmbot wrote: @llvm/pr-subscribers-clang-codegen Author: Helena Kotas (hekota) Changes Adds support for accessing individual resources from fixed-size global resource arrays. Design proposal: https://github.com/llvm/wg-hlsl/blob/main/proposals/0028-resource-arrays.md Enables indexing into globally scoped, fixed-size resource arrays to retrieve individual resources. The initialization logic is primarily handled during codegen. When a global resource array is indexed, the codegen translates the `ArraySubscriptExpr` AST node into a constructor call for the corresponding resource record type and binding. To support this behavior, Sema needs to ensure that: - The constructor for the specific resource type is instantiated. - An implicit binding attribute is added to resource arrays that lack explicit bindings (#152452). Closes #145424 Depends on #152450 and #152452. --- Patch is 28.60 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/152454.diff 9 Files Affected: - (modified) clang/include/clang/Sema/SemaHLSL.h (+8-1) - (modified) clang/lib/CodeGen/CGExpr.cpp (+10) - (modified) clang/lib/CodeGen/CGHLSLRuntime.cpp (+211-12) - (modified) clang/lib/CodeGen/CGHLSLRuntime.h (+6) - (modified) clang/lib/CodeGen/CodeGenModule.cpp (+2-2) - (modified) clang/lib/Sema/SemaHLSL.cpp (+70-23) - (added) clang/test/CodeGenHLSL/resources/res-array-global-multi-dim.hlsl (+32) - (added) clang/test/CodeGenHLSL/resources/res-array-global.hlsl (+59) - (modified) clang/test/CodeGenHLSL/static-local-ctor.hlsl (+3-2) ``diff diff --git a/clang/include/clang/Sema/SemaHLSL.h b/clang/include/clang/Sema/SemaHLSL.h index 085c9ed9f3ebd..0c215c6e10013 100644 --- a/clang/include/clang/Sema/SemaHLSL.h +++ b/clang/include/clang/Sema/SemaHLSL.h @@ -229,10 +229,17 @@ class SemaHLSL : public SemaBase { void diagnoseAvailabilityViolations(TranslationUnitDecl *TU); - bool initGlobalResourceDecl(VarDecl *VD); uint32_t getNextImplicitBindingOrderID() { return ImplicitBindingNextOrderID++; } + + bool initGlobalResourceDecl(VarDecl *VD); + bool initGlobalResourceArrayDecl(VarDecl *VD); + void createResourceRecordCtorArgs(const Type *ResourceTy, StringRef VarName, +HLSLResourceBindingAttr *RBA, +HLSLVkBindingAttr *VkBinding, +uint32_t ArrayIndex, +llvm::SmallVector &Args); }; } // namespace clang diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp index ed35a055d8a7f..8c34fb501a3b8 100644 --- a/clang/lib/CodeGen/CGExpr.cpp +++ b/clang/lib/CodeGen/CGExpr.cpp @@ -16,6 +16,7 @@ #include "CGCall.h" #include "CGCleanup.h" #include "CGDebugInfo.h" +#include "CGHLSLRuntime.h" #include "CGObjCRuntime.h" #include "CGOpenMPRuntime.h" #include "CGRecordLayout.h" @@ -4532,6 +4533,15 @@ LValue CodeGenFunction::EmitArraySubscriptExpr(const ArraySubscriptExpr *E, LHS.getBaseInfo(), TBAAAccessInfo()); } + // The HLSL runtime handle the subscript expression on global resource arrays. + if (getLangOpts().HLSL && (E->getType()->isHLSLResourceRecord() || + E->getType()->isHLSLResourceRecordArray())) { +std::optional LV = +CGM.getHLSLRuntime().emitResourceArraySubscriptExpr(E, *this); +if (LV.has_value()) + return *LV; + } + // All the other cases basically behave like simple offsetting. // Handle the extvector case we ignored above. diff --git a/clang/lib/CodeGen/CGHLSLRuntime.cpp b/clang/lib/CodeGen/CGHLSLRuntime.cpp index 918cb3e38448d..a09e540367a18 100644 --- a/clang/lib/CodeGen/CGHLSLRuntime.cpp +++ b/clang/lib/CodeGen/CGHLSLRuntime.cpp @@ -84,6 +84,124 @@ void addRootSignature(llvm::dxbc::RootSignatureVersion RootSigVer, RootSignatureValMD->addOperand(MDVals); } +// If the specified expr is a simple decay from an array to pointer, +// return the array subexpression. Otherwise, return nullptr. +static const Expr *getSubExprFromArrayDecayOperand(const Expr *E) { + const auto *CE = dyn_cast(E); + if (!CE || CE->getCastKind() != CK_ArrayToPointerDecay) +return nullptr; + return CE->getSubExpr(); +} + +// Find array variable declaration from nested array subscript AST nodes +static const ValueDecl *getArrayDecl(const ArraySubscriptExpr *ASE) { + const Expr *E = nullptr; + while (ASE != nullptr) { +E = getSubExprFromArrayDecayOperand(ASE->getBase()); +if (!E) + return nullptr; +ASE = dyn_cast(E); + } + if (const DeclRefExpr *DRE = dyn_cast_or_null(E)) +return DRE->getDecl(); + return nullptr; +} + +// Get the total size of the array, or -1 if the array is unbounded. +static int getTotalArraySize(const clang::Type *Ty) { + assert(Ty->isArrayType() && "expected array type"); + if (Ty->isIncompleteArrayType()) +return -1; + int Size = 1; + while (const
[llvm-branch-commits] [clang] [HLSL] Global resource arrays element access (PR #152454)
https://github.com/hekota ready_for_review https://github.com/llvm/llvm-project/pull/152454 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Enable ISD::PTRADD for 64-bit AS by default (PR #146076)
changpeng wrote: > Rebase and updated new test checks. @changpeng, could you please verify if > the AMDGPU/no-folding-imm-to-inst-with-fi.ll test that #151263 recently added > still does what it is supposed to do with the updated checks in this PR? It is good (as long as it passes) https://github.com/llvm/llvm-project/pull/146076 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/21.x: [DAG] visitFREEZE - limit freezing of multiple operands (PR #150425)
RKSimon wrote: @tru @nikic Is there anything that I still need to do here? https://github.com/llvm/llvm-project/pull/150425 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [HLSL] Add implicit binding attribute to resource arrays (PR #152452)
https://github.com/hekota ready_for_review https://github.com/llvm/llvm-project/pull/152452 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [HLSL] Add implicit binding attribute to resource arrays (PR #152452)
llvmbot wrote: @llvm/pr-subscribers-clang @llvm/pr-subscribers-hlsl Author: Helena Kotas (hekota) Changes If a resource array does not have an explicit binding attribute, SemaHLSL will add an implicit one. The attribute will be used to transfer implicit binding order ID to the codegen, the same way as it is done for HLSLBufferDecls. This is necessary in order to generate correct initialization of resources in an array that does not have an explicit binding. Depends on #152450 Part 1 of #145424 --- Full diff: https://github.com/llvm/llvm-project/pull/152452.diff 2 Files Affected: - (modified) clang/lib/Sema/SemaHLSL.cpp (+44-11) - (modified) clang/test/AST/HLSL/resource_binding_attr.hlsl (+20) ``diff diff --git a/clang/lib/Sema/SemaHLSL.cpp b/clang/lib/Sema/SemaHLSL.cpp index 17f17f8114373..6811f3f27603b 100644 --- a/clang/lib/Sema/SemaHLSL.cpp +++ b/clang/lib/Sema/SemaHLSL.cpp @@ -71,6 +71,10 @@ static RegisterType getRegisterType(ResourceClass RC) { llvm_unreachable("unexpected ResourceClass value"); } +static RegisterType getRegisterType(const HLSLAttributedResourceType *ResTy) { + return getRegisterType(ResTy->getAttrs().ResourceClass); +} + // Converts the first letter of string Slot to RegisterType. // Returns false if the letter does not correspond to a valid register type. static bool convertToRegisterType(StringRef Slot, RegisterType *RT) { @@ -342,6 +346,17 @@ static bool isResourceRecordTypeOrArrayOf(VarDecl *VD) { return Ty->isHLSLResourceRecord() || Ty->isHLSLResourceRecordArray(); } +static const HLSLAttributedResourceType * +getResourceArrayHandleType(VarDecl *VD) { + assert(VD->getType()->isHLSLResourceRecordArray() && + "expected array of resource records"); + const Type *Ty = VD->getType()->getUnqualifiedDesugaredType(); + while (const ConstantArrayType *CAT = dyn_cast(Ty)) { +Ty = CAT->getArrayElementTypeNoTypeQual()->getUnqualifiedDesugaredType(); + } + return HLSLAttributedResourceType::findHandleTypeOnResource(Ty); +} + // Returns true if the type is a leaf element type that is not valid to be // included in HLSL Buffer, such as a resource class, empty struct, zero-sized // array, or a builtin intangible type. Returns false it is a valid leaf element @@ -568,16 +583,13 @@ void createHostLayoutStructForBuffer(Sema &S, HLSLBufferDecl *BufDecl) { BufDecl->addLayoutStruct(LS); } -static void addImplicitBindingAttrToBuffer(Sema &S, HLSLBufferDecl *BufDecl, - uint32_t ImplicitBindingOrderID) { - RegisterType RT = - BufDecl->isCBuffer() ? RegisterType::CBuffer : RegisterType::SRV; +static void addImplicitBindingAttrToDecl(Sema &S, Decl *D, RegisterType RT, + uint32_t ImplicitBindingOrderID) { auto *Attr = HLSLResourceBindingAttr::CreateImplicit(S.getASTContext(), "", "0", {}); - std::optional RegSlot; - Attr->setBinding(RT, RegSlot, 0); + Attr->setBinding(RT, std::nullopt, 0); Attr->setImplicitBindingOrderID(ImplicitBindingOrderID); - BufDecl->addAttr(Attr); + D->addAttr(Attr); } // Handle end of cbuffer/tbuffer declaration @@ -600,7 +612,10 @@ void SemaHLSL::ActOnFinishBuffer(Decl *Dcl, SourceLocation RBrace) { if (RBA) RBA->setImplicitBindingOrderID(OrderID); else - addImplicitBindingAttrToBuffer(SemaRef, BufDecl, OrderID); + addImplicitBindingAttrToDecl(SemaRef, BufDecl, + BufDecl->isCBuffer() ? RegisterType::CBuffer +: RegisterType::SRV, + OrderID); } SemaRef.PopDeclContext(); @@ -1906,7 +1921,7 @@ static bool DiagnoseLocalRegisterBinding(Sema &S, SourceLocation &ArgLoc, if (const HLSLAttributedResourceType *AttrResType = HLSLAttributedResourceType::findHandleTypeOnResource( VD->getType().getTypePtr())) { -if (RegType == getRegisterType(AttrResType->getAttrs().ResourceClass)) +if (RegType == getRegisterType(AttrResType)) return true; S.Diag(D->getLocation(), diag::err_hlsl_binding_type_mismatch) @@ -2439,8 +2454,8 @@ void SemaHLSL::ActOnEndOfTranslationUnit(TranslationUnitDecl *TU) { HLSLBufferDecl *DefaultCBuffer = HLSLBufferDecl::CreateDefaultCBuffer( SemaRef.getASTContext(), SemaRef.getCurLexicalContext(), DefaultCBufferDecls); -addImplicitBindingAttrToBuffer(SemaRef, DefaultCBuffer, - getNextImplicitBindingOrderID()); +addImplicitBindingAttrToDecl(SemaRef, DefaultCBuffer, RegisterType::CBuffer, + getNextImplicitBindingOrderID()); SemaRef.getCurLexicalContext()->addDecl(DefaultCBuffer); createHostLayoutStructForBuffer(SemaRef, DefaultCBuffer); @@ -3640,6 +3655,24 @@ void SemaHLSL::ActOnVariableDeclarator(VarDecl *VD) { // process explicit bindings processExplicitBindingsOnDec
[llvm-branch-commits] [llvm] [AArch64][ISel] Extend vector_splice tests (NFC) (PR #152553)
https://github.com/gbossu created https://github.com/llvm/llvm-project/pull/152553 They use extract shuffles for fixed vectors, and llvm.vector.splice intrinsics for scalable vectors. In the previous tests using ld+extract+st, the extract was optimized away and replaced by a smaller load at the right offset. This meant we didin't really test the vector_splice ISD node. **This is a chained PR** From a6be08b2dd026b6b3dcd7ca8ed5e231671a160b3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ga=C3=ABtan=20Bossu?= Date: Wed, 6 Aug 2025 10:32:44 + Subject: [PATCH] [AArch64][ISel] Extend vector_splice tests (NFC) They use extract shuffles for fixed vectors, and llvm.vector.splice intrinsics for scalable vectors. In the previous tests using ld+extract+st, the extract was optimized away and replaced by a smaller load at the right offset. This meant we didin't really test the vector_splice ISD node. --- .../sve-fixed-length-extract-subvector.ll | 368 +- .../test/CodeGen/AArch64/sve-vector-splice.ll | 162 2 files changed, 526 insertions(+), 4 deletions(-) create mode 100644 llvm/test/CodeGen/AArch64/sve-vector-splice.ll diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-extract-subvector.ll b/llvm/test/CodeGen/AArch64/sve-fixed-length-extract-subvector.ll index 2dd3269a2..800f95d97af4c 100644 --- a/llvm/test/CodeGen/AArch64/sve-fixed-length-extract-subvector.ll +++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-extract-subvector.ll @@ -5,6 +5,12 @@ target triple = "aarch64-unknown-linux-gnu" +; Note that both the vector.extract intrinsics and SK_ExtractSubvector +; shufflevector instructions get detected as a extract_subvector ISD node in +; SelectionDAG. We'll test both cases for the sake of completeness, even though +; vector.extract intrinsics should get lowered into shufflevector by the time we +; reach the backend. + ; i8 ; Don't use SVE for 64-bit vectors. @@ -40,6 +46,67 @@ define void @extract_subvector_v32i8(ptr %a, ptr %b) vscale_range(2,0) #0 { ret void } +define void @extract_v32i8_halves(ptr %in, ptr %out, ptr %out2) #0 vscale_range(2,2) { +; CHECK-LABEL: extract_v32i8_halves: +; CHECK: // %bb.0: // %entry +; CHECK-NEXT:ldr z0, [x0] +; CHECK-NEXT:mov z1.d, z0.d +; CHECK-NEXT:ext z1.b, z1.b, z0.b, #16 +; CHECK-NEXT:str q1, [x1] +; CHECK-NEXT:str q0, [x2] +; CHECK-NEXT:ret +entry: + %b = load <32 x i8>, ptr %in + %hi = shufflevector <32 x i8> %b, <32 x i8> poison, <16 x i32> + store <16 x i8> %hi, ptr %out + %lo = shufflevector <32 x i8> %b, <32 x i8> poison, <16 x i32> + store <16 x i8> %lo, ptr %out2 + ret void +} + +define void @extract_v32i8_half_unaligned(ptr %in, ptr %out) #0 vscale_range(2,2) { +; CHECK-LABEL: extract_v32i8_half_unaligned: +; CHECK: // %bb.0: // %entry +; CHECK-NEXT:ldr z0, [x0] +; CHECK-NEXT:mov z1.d, z0.d +; CHECK-NEXT:ext z1.b, z1.b, z0.b, #16 +; CHECK-NEXT:ext v0.16b, v0.16b, v1.16b, #4 +; CHECK-NEXT:str q0, [x1] +; CHECK-NEXT:ret +entry: + %b = load <32 x i8>, ptr %in + %d = shufflevector <32 x i8> %b, <32 x i8> poison, <16 x i32> + store <16 x i8> %d, ptr %out + ret void +} + +define void @extract_v32i8_quarters(ptr %in, ptr %out, ptr %out2, ptr %out3, ptr %out4) #0 vscale_range(2,2) { +; CHECK-LABEL: extract_v32i8_quarters: +; CHECK: // %bb.0: // %entry +; CHECK-NEXT:ldr z0, [x0] +; CHECK-NEXT:mov z1.d, z0.d +; CHECK-NEXT:mov z2.d, z0.d +; CHECK-NEXT:ext z1.b, z1.b, z0.b, #16 +; CHECK-NEXT:ext z2.b, z2.b, z0.b, #24 +; CHECK-NEXT:str d1, [x1] +; CHECK-NEXT:str d2, [x2] +; CHECK-NEXT:str d0, [x3] +; CHECK-NEXT:ext z0.b, z0.b, z0.b, #8 +; CHECK-NEXT:str d0, [x4] +; CHECK-NEXT:ret +entry: + %b = load <32 x i8>, ptr %in + %hilo = shufflevector <32 x i8> %b, <32 x i8> poison, <8 x i32> + store <8 x i8> %hilo, ptr %out + %hihi = shufflevector <32 x i8> %b, <32 x i8> poison, <8 x i32> + store <8 x i8> %hihi, ptr %out2 + %lolo = shufflevector <32 x i8> %b, <32 x i8> poison, <8 x i32> + store <8 x i8> %lolo, ptr %out3 + %lohi = shufflevector <32 x i8> %b, <32 x i8> poison, <8 x i32> + store <8 x i8> %lohi, ptr %out4 + ret void +} + define void @extract_subvector_v64i8(ptr %a, ptr %b) #0 { ; CHECK-LABEL: extract_subvector_v64i8: ; CHECK: // %bb.0: @@ -54,6 +121,25 @@ define void @extract_subvector_v64i8(ptr %a, ptr %b) #0 { ret void } +define void @extract_v64i8_halves(ptr %in, ptr %out, ptr %out2) #0 vscale_range(4,4) { +; CHECK-LABEL: extract_v64i8_halves: +; CHECK: // %bb.0: // %entry +; CHECK-NEXT:ldr z0, [x0] +; CHECK-NEXT:ptrue p0.b, vl32 +; CHECK-NEXT:mov z1.d, z0.d +; CHECK-NEXT:ext z1.b, z1.b, z0.b, #32 +; CHECK-NEXT:st1b { z1.b }, p0, [x1] +; CHECK-NEXT:st1b { z0.b }, p0, [x2] +; CHECK-NEXT:ret +entry: + %b = load <64 x i8>, ptr %in + %hi = shufflevector <64 x i8> %b, <64 x i8> poison, <32 x i32> + store <32 x i8> %hi, ptr
[llvm-branch-commits] [llvm] [AArch64][ISel] Extend vector_splice tests (NFC) (PR #152553)
llvmbot wrote: @llvm/pr-subscribers-backend-aarch64 Author: Gaëtan Bossu (gbossu) Changes They use extract shuffles for fixed vectors, and llvm.vector.splice intrinsics for scalable vectors. In the previous tests using ld+extract+st, the extract was optimized away and replaced by a smaller load at the right offset. This meant we didin't really test the vector_splice ISD node. **This is a chained PR** --- Patch is 27.84 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/152553.diff 2 Files Affected: - (modified) llvm/test/CodeGen/AArch64/sve-fixed-length-extract-subvector.ll (+364-4) - (added) llvm/test/CodeGen/AArch64/sve-vector-splice.ll (+162) ``diff diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-extract-subvector.ll b/llvm/test/CodeGen/AArch64/sve-fixed-length-extract-subvector.ll index 2dd3269a2..800f95d97af4c 100644 --- a/llvm/test/CodeGen/AArch64/sve-fixed-length-extract-subvector.ll +++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-extract-subvector.ll @@ -5,6 +5,12 @@ target triple = "aarch64-unknown-linux-gnu" +; Note that both the vector.extract intrinsics and SK_ExtractSubvector +; shufflevector instructions get detected as a extract_subvector ISD node in +; SelectionDAG. We'll test both cases for the sake of completeness, even though +; vector.extract intrinsics should get lowered into shufflevector by the time we +; reach the backend. + ; i8 ; Don't use SVE for 64-bit vectors. @@ -40,6 +46,67 @@ define void @extract_subvector_v32i8(ptr %a, ptr %b) vscale_range(2,0) #0 { ret void } +define void @extract_v32i8_halves(ptr %in, ptr %out, ptr %out2) #0 vscale_range(2,2) { +; CHECK-LABEL: extract_v32i8_halves: +; CHECK: // %bb.0: // %entry +; CHECK-NEXT:ldr z0, [x0] +; CHECK-NEXT:mov z1.d, z0.d +; CHECK-NEXT:ext z1.b, z1.b, z0.b, #16 +; CHECK-NEXT:str q1, [x1] +; CHECK-NEXT:str q0, [x2] +; CHECK-NEXT:ret +entry: + %b = load <32 x i8>, ptr %in + %hi = shufflevector <32 x i8> %b, <32 x i8> poison, <16 x i32> + store <16 x i8> %hi, ptr %out + %lo = shufflevector <32 x i8> %b, <32 x i8> poison, <16 x i32> + store <16 x i8> %lo, ptr %out2 + ret void +} + +define void @extract_v32i8_half_unaligned(ptr %in, ptr %out) #0 vscale_range(2,2) { +; CHECK-LABEL: extract_v32i8_half_unaligned: +; CHECK: // %bb.0: // %entry +; CHECK-NEXT:ldr z0, [x0] +; CHECK-NEXT:mov z1.d, z0.d +; CHECK-NEXT:ext z1.b, z1.b, z0.b, #16 +; CHECK-NEXT:ext v0.16b, v0.16b, v1.16b, #4 +; CHECK-NEXT:str q0, [x1] +; CHECK-NEXT:ret +entry: + %b = load <32 x i8>, ptr %in + %d = shufflevector <32 x i8> %b, <32 x i8> poison, <16 x i32> + store <16 x i8> %d, ptr %out + ret void +} + +define void @extract_v32i8_quarters(ptr %in, ptr %out, ptr %out2, ptr %out3, ptr %out4) #0 vscale_range(2,2) { +; CHECK-LABEL: extract_v32i8_quarters: +; CHECK: // %bb.0: // %entry +; CHECK-NEXT:ldr z0, [x0] +; CHECK-NEXT:mov z1.d, z0.d +; CHECK-NEXT:mov z2.d, z0.d +; CHECK-NEXT:ext z1.b, z1.b, z0.b, #16 +; CHECK-NEXT:ext z2.b, z2.b, z0.b, #24 +; CHECK-NEXT:str d1, [x1] +; CHECK-NEXT:str d2, [x2] +; CHECK-NEXT:str d0, [x3] +; CHECK-NEXT:ext z0.b, z0.b, z0.b, #8 +; CHECK-NEXT:str d0, [x4] +; CHECK-NEXT:ret +entry: + %b = load <32 x i8>, ptr %in + %hilo = shufflevector <32 x i8> %b, <32 x i8> poison, <8 x i32> + store <8 x i8> %hilo, ptr %out + %hihi = shufflevector <32 x i8> %b, <32 x i8> poison, <8 x i32> + store <8 x i8> %hihi, ptr %out2 + %lolo = shufflevector <32 x i8> %b, <32 x i8> poison, <8 x i32> + store <8 x i8> %lolo, ptr %out3 + %lohi = shufflevector <32 x i8> %b, <32 x i8> poison, <8 x i32> + store <8 x i8> %lohi, ptr %out4 + ret void +} + define void @extract_subvector_v64i8(ptr %a, ptr %b) #0 { ; CHECK-LABEL: extract_subvector_v64i8: ; CHECK: // %bb.0: @@ -54,6 +121,25 @@ define void @extract_subvector_v64i8(ptr %a, ptr %b) #0 { ret void } +define void @extract_v64i8_halves(ptr %in, ptr %out, ptr %out2) #0 vscale_range(4,4) { +; CHECK-LABEL: extract_v64i8_halves: +; CHECK: // %bb.0: // %entry +; CHECK-NEXT:ldr z0, [x0] +; CHECK-NEXT:ptrue p0.b, vl32 +; CHECK-NEXT:mov z1.d, z0.d +; CHECK-NEXT:ext z1.b, z1.b, z0.b, #32 +; CHECK-NEXT:st1b { z1.b }, p0, [x1] +; CHECK-NEXT:st1b { z0.b }, p0, [x2] +; CHECK-NEXT:ret +entry: + %b = load <64 x i8>, ptr %in + %hi = shufflevector <64 x i8> %b, <64 x i8> poison, <32 x i32> + store <32 x i8> %hi, ptr %out + %lo = shufflevector <64 x i8> %b, <64 x i8> poison, <32 x i32> + store <32 x i8> %lo, ptr %out2 + ret void +} + define void @extract_subvector_v128i8(ptr %a, ptr %b) vscale_range(8,0) #0 { ; CHECK-LABEL: extract_subvector_v128i8: ; CHECK: // %bb.0: @@ -117,6 +203,24 @@ define void @extract_subvector_v16i16(ptr %a, ptr %b) vscale_range(2,0) #0 { ret void } +define void @extract_v16i16_halves(ptr %in, ptr
[llvm-branch-commits] [llvm] [AArch64][ISel] Select constructive EXT_ZZZI pseudo instruction (PR #152554)
=?utf-8?q?Gaëtan?= Bossu Message-ID: In-Reply-To: llvmbot wrote: @llvm/pr-subscribers-backend-aarch64 Author: Gaëtan Bossu (gbossu) Changes The patch changes existing patterns to select the EXT_ZZZI pseudo instead of the EXT_ZZI destructive instruction for vector_splice. Given that registers aren't tied anymore, this gives the register allocator more freedom and a lot of MOVs get replaced with MOVPRFX. In some cases however, we could have just chosen the same input and output register, but regalloc preferred not to. This means we end up with some test cases now having more instructions: there is now a MOVPRFX while no MOV was previously needed. --- Patch is 154.60 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/152554.diff 21 Files Affected: - (modified) llvm/lib/Target/AArch64/AArch64PostCoalescerPass.cpp (+7-3) - (modified) llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td (+4-4) - (modified) llvm/test/CodeGen/AArch64/sve-fixed-length-extract-subvector.ll (+21-20) - (modified) llvm/test/CodeGen/AArch64/sve-fixed-length-fp-to-int.ll (+24-20) - (modified) llvm/test/CodeGen/AArch64/sve-fixed-length-int-extends.ll (+30-24) - (modified) llvm/test/CodeGen/AArch64/sve-fixed-length-int-rem.ll (+20-20) - (modified) llvm/test/CodeGen/AArch64/sve-fixed-length-int-to-fp.ll (+24-20) - (modified) llvm/test/CodeGen/AArch64/sve-fixed-length-limit-duplane.ll (+8-6) - (modified) llvm/test/CodeGen/AArch64/sve-fixed-length-masked-loads.ll (+70-56) - (modified) llvm/test/CodeGen/AArch64/sve-fixed-length-partial-reduce.ll (+14-14) - (modified) llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll (+21-20) - (modified) llvm/test/CodeGen/AArch64/sve-pr92779.ll (+9-9) - (modified) llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-fp-extend-trunc.ll (+15-12) - (modified) llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-fp-to-int.ll (+150-136) - (modified) llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-extends.ll (+413-327) - (modified) llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-rem.ll (+108-108) - (modified) llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-to-fp.ll (+152-132) - (modified) llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-limit-duplane.ll (+8-7) - (modified) llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-masked-load.ll (+14-12) - (modified) llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-masked-store.ll (+20-18) - (modified) llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-reductions.ll (+52-42) ``diff diff --git a/llvm/lib/Target/AArch64/AArch64PostCoalescerPass.cpp b/llvm/lib/Target/AArch64/AArch64PostCoalescerPass.cpp index cdf2822f3ed9d..b7d69b68af4ee 100644 --- a/llvm/lib/Target/AArch64/AArch64PostCoalescerPass.cpp +++ b/llvm/lib/Target/AArch64/AArch64PostCoalescerPass.cpp @@ -53,9 +53,6 @@ bool AArch64PostCoalescer::runOnMachineFunction(MachineFunction &MF) { if (skipFunction(MF.getFunction())) return false; - AArch64FunctionInfo *FuncInfo = MF.getInfo(); - if (!FuncInfo->hasStreamingModeChanges()) -return false; MRI = &MF.getRegInfo(); LIS = &getAnalysis().getLIS(); @@ -86,6 +83,13 @@ bool AArch64PostCoalescer::runOnMachineFunction(MachineFunction &MF) { Changed = true; break; } + case AArch64::EXT_ZZZI: +Register DstReg = MI.getOperand(0).getReg(); +Register SrcReg1 = MI.getOperand(1).getReg(); +if (SrcReg1 != DstReg) { + MRI->setRegAllocationHint(DstReg, 0, SrcReg1); +} +break; } } } diff --git a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td index 85e647af6684c..a3ca0cb73cd43 100644 --- a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td +++ b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td @@ -2135,19 +2135,19 @@ let Predicates = [HasSVE_or_SME] in { // Splice with lane bigger or equal to 0 foreach VT = [nxv16i8] in def : Pat<(VT (vector_splice VT:$Z1, VT:$Z2, (i64 (sve_ext_imm_0_255 i32:$index, - (EXT_ZZI ZPR:$Z1, ZPR:$Z2, imm0_255:$index)>; + (EXT_ZZZI ZPR:$Z1, ZPR:$Z2, imm0_255:$index)>; foreach VT = [nxv8i16, nxv8f16, nxv8bf16] in def : Pat<(VT (vector_splice VT:$Z1, VT:$Z2, (i64 (sve_ext_imm_0_127 i32:$index, - (EXT_ZZI ZPR:$Z1, ZPR:$Z2, imm0_255:$index)>; + (EXT_ZZZI ZPR:$Z1, ZPR:$Z2, imm0_255:$index)>; foreach VT = [nxv4i32, nxv4f16, nxv4f32, nxv4bf16] in def : Pat<(VT (vector_splice VT:$Z1, VT:$Z2, (i64 (sve_ext_imm_0_63 i32:$index, - (EXT_ZZI ZPR:$Z1, ZPR:$Z2, imm0_255:$index)>; + (EXT_ZZZI ZPR:$Z1, ZPR:$Z2, imm0_255:$index)>; foreach VT = [nxv2i64, nxv2f16, nxv2f32, nxv2f64, nxv2bf16] in def : Pat<(VT (vector_splice VT:$Z1, VT:$Z2, (i64 (sve_ext_imm_0_31 i32:$index, -
[llvm-branch-commits] [flang] [llvm] [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (PR #151989)
https://github.com/TIFitis updated https://github.com/llvm/llvm-project/pull/151989 >From e9b6766c5fbfd25b5acfc686cbdc41f8dd727b03 Mon Sep 17 00:00:00 2001 From: Akash Banerjee Date: Thu, 31 Jul 2025 19:48:15 +0100 Subject: [PATCH 1/2] [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR Add a new AutomapToTargetData pass. This gathers the declare target enter variables which have the AUTOMAP modifier. And adds omp.declare_target_enter/exit mapping directives for fir.alloca and fir.free oeprations on the AUTOMAP enabled variables. --- .../include/flang/Optimizer/OpenMP/Passes.td | 11 ++ .../Optimizer/OpenMP/AutomapToTargetData.cpp | 171 ++ flang/lib/Optimizer/OpenMP/CMakeLists.txt | 1 + flang/lib/Optimizer/Passes/Pipelines.cpp | 12 +- .../Transforms/omp-automap-to-target-data.fir | 40 .../fortran/declare-target-automap.f90| 36 6 files changed, 265 insertions(+), 6 deletions(-) create mode 100644 flang/lib/Optimizer/OpenMP/AutomapToTargetData.cpp create mode 100644 flang/test/Transforms/omp-automap-to-target-data.fir create mode 100644 offload/test/offloading/fortran/declare-target-automap.f90 diff --git a/flang/include/flang/Optimizer/OpenMP/Passes.td b/flang/include/flang/Optimizer/OpenMP/Passes.td index 704faf0ccd856..0bff58f0f6394 100644 --- a/flang/include/flang/Optimizer/OpenMP/Passes.td +++ b/flang/include/flang/Optimizer/OpenMP/Passes.td @@ -112,4 +112,15 @@ def GenericLoopConversionPass ]; } +def AutomapToTargetDataPass +: Pass<"omp-automap-to-target-data", "::mlir::ModuleOp"> { + let summary = "Insert OpenMP target data operations for AUTOMAP variables"; + let description = [{ +Inserts `omp.target_enter_data` and `omp.target_exit_data` operations to +map variables marked with the `AUTOMAP` modifier when their allocation +or deallocation is detected in the FIR. + }]; + let dependentDialects = ["mlir::omp::OpenMPDialect"]; +} + #endif //FORTRAN_OPTIMIZER_OPENMP_PASSES diff --git a/flang/lib/Optimizer/OpenMP/AutomapToTargetData.cpp b/flang/lib/Optimizer/OpenMP/AutomapToTargetData.cpp new file mode 100644 index 0..c4937f1e90ee3 --- /dev/null +++ b/flang/lib/Optimizer/OpenMP/AutomapToTargetData.cpp @@ -0,0 +1,171 @@ +//===- AutomapToTargetData.cpp ---===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// + +#include "flang/Optimizer/Builder/DirectivesCommon.h" +#include "flang/Optimizer/Builder/FIRBuilder.h" +#include "flang/Optimizer/Builder/HLFIRTools.h" +#include "flang/Optimizer/Dialect/FIROps.h" +#include "flang/Optimizer/Dialect/FIRType.h" +#include "flang/Optimizer/Dialect/Support/KindMapping.h" +#include "flang/Optimizer/HLFIR/HLFIROps.h" +#include "mlir/IR/BuiltinAttributes.h" +#include "mlir/Pass/Pass.h" +#include "llvm/Frontend/OpenMP/OMPConstants.h" +#include +#include + +namespace flangomp { +#define GEN_PASS_DEF_AUTOMAPTOTARGETDATAPASS +#include "flang/Optimizer/OpenMP/Passes.h.inc" +} // namespace flangomp + +using namespace mlir; + +namespace { +class AutomapToTargetDataPass +: public flangomp::impl::AutomapToTargetDataPassBase< + AutomapToTargetDataPass> { + // Returns true if the variable has a dynamic size and therefore requires + // bounds operations to describe its extents. + bool needsBoundsOps(Value var) { +assert(isa(var.getType()) && + "only pointer like types expected"); +Type t = fir::unwrapRefType(var.getType()); +if (Type inner = fir::dyn_cast_ptrOrBoxEleTy(t)) + return fir::hasDynamicSize(inner); +return fir::hasDynamicSize(t); + } + + // Generate MapBoundsOp operations for the variable if required. + void genBoundsOps(fir::FirOpBuilder &builder, Value var, +SmallVectorImpl &boundsOps) { +Location loc = var.getLoc(); +fir::factory::AddrAndBoundsInfo info = +fir::factory::getDataOperandBaseAddr(builder, var, + /*isOptional=*/false, loc); +fir::ExtendedValue exv = +hlfir::translateToExtendedValue(loc, builder, hlfir::Entity{info.addr}, +/*contiguousHint=*/true) +.first; +SmallVector tmp = +fir::factory::genImplicitBoundsOps( +builder, info, exv, /*dataExvIsAssumedSize=*/false, loc); +llvm::append_range(boundsOps, tmp); + } + + void findRelatedAllocmemFreemem(fir::AddrOfOp addressOfOp, + llvm::SmallVector &allocmems, + llvm::SmallVector &freemems) { +assert(addressOfOp->hasOneUse() && "op must have single use"); + +auto declaredRef = +cast(*addressOfOp->getUsers().begin
[llvm-branch-commits] [llvm] [AArch64][ISel] Select constructive EXT_ZZZI pseudo instruction (PR #152554)
=?utf-8?q?Gaëtan?= Bossu Message-ID: In-Reply-To: github-actions[bot] wrote: :warning: C/C++ code formatter, clang-format found issues in your code. :warning: You can test this locally with the following command: ``bash git-clang-format --diff HEAD~1 HEAD --extensions cpp -- llvm/lib/Target/AArch64/AArch64PostCoalescerPass.cpp `` View the diff from clang-format here. ``diff diff --git a/llvm/lib/Target/AArch64/AArch64PostCoalescerPass.cpp b/llvm/lib/Target/AArch64/AArch64PostCoalescerPass.cpp index b7d69b68a..9d3e9105f 100644 --- a/llvm/lib/Target/AArch64/AArch64PostCoalescerPass.cpp +++ b/llvm/lib/Target/AArch64/AArch64PostCoalescerPass.cpp @@ -53,7 +53,6 @@ bool AArch64PostCoalescer::runOnMachineFunction(MachineFunction &MF) { if (skipFunction(MF.getFunction())) return false; - MRI = &MF.getRegInfo(); LIS = &getAnalysis().getLIS(); bool Changed = false; `` https://github.com/llvm/llvm-project/pull/152554 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/21.x: [Flang] Search flang_rt in clang_rt path (#151954) (PR #152458)
llvmbot wrote: @llvm/pr-subscribers-clang Author: None (llvmbot) Changes Backport 8de481913353a1e37264687d5cc73db0de19e6cc Requested by: @Meinersbur --- Full diff: https://github.com/llvm/llvm-project/pull/152458.diff 1 Files Affected: - (modified) clang/lib/Driver/ToolChain.cpp (+21-8) ``diff diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp index 3f9b808b2722e..07a3ae925f96d 100644 --- a/clang/lib/Driver/ToolChain.cpp +++ b/clang/lib/Driver/ToolChain.cpp @@ -837,17 +837,30 @@ void ToolChain::addFortranRuntimeLibs(const ArgList &Args, void ToolChain::addFortranRuntimeLibraryPath(const llvm::opt::ArgList &Args, ArgStringList &CmdArgs) const { - // Default to the /../lib directory. This works fine on the - // platforms that we have tested so far. We will probably have to re-fine - // this in the future. In particular, on some platforms, we may need to use - // lib64 instead of lib. + auto AddLibSearchPathIfExists = [&](const Twine &Path) { +// Linker may emit warnings about non-existing directories +if (!llvm::sys::fs::is_directory(Path)) + return; + +if (getTriple().isKnownWindowsMSVCEnvironment()) + CmdArgs.push_back(Args.MakeArgString("-libpath:" + Path)); +else + CmdArgs.push_back(Args.MakeArgString("-L" + Path)); + }; + + // Search for flang_rt.* at the same location as clang_rt.* with + // LLVM_ENABLE_PER_TARGET_RUNTIME_DIR=0. On most platforms, flang_rt is + // located at the path returned by getRuntimePath() which is already added to + // the library search path. This exception is for Apple-Darwin. + AddLibSearchPathIfExists(getCompilerRTPath()); + + // Fall back to the non-resource directory /../lib. We will + // probably have to refine this in the future. In particular, on some + // platforms, we may need to use lib64 instead of lib. SmallString<256> DefaultLibPath = llvm::sys::path::parent_path(getDriver().Dir); llvm::sys::path::append(DefaultLibPath, "lib"); - if (getTriple().isKnownWindowsMSVCEnvironment()) -CmdArgs.push_back(Args.MakeArgString("-libpath:" + DefaultLibPath)); - else -CmdArgs.push_back(Args.MakeArgString("-L" + DefaultLibPath)); + AddLibSearchPathIfExists(DefaultLibPath); } void ToolChain::addFlangRTLibPath(const ArgList &Args, `` https://github.com/llvm/llvm-project/pull/152458 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [llvm][cmake] Turn runtime in PROJECTS warnings into FATAL_ERROR (PR #152302)
https://github.com/DavidSpickett demilestoned https://github.com/llvm/llvm-project/pull/152302 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/21.x: [Flang] Search flang_rt in clang_rt path (#151954) (PR #152458)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/152458 Backport 8de481913353a1e37264687d5cc73db0de19e6cc Requested by: @Meinersbur >From 8a59c3705a92e904f9cdcbfe73342d6197659db0 Mon Sep 17 00:00:00 2001 From: Michael Kruse Date: Wed, 6 Aug 2025 16:58:08 +0200 Subject: [PATCH] [Flang] Search flang_rt in clang_rt path (#151954) The clang/flang driver has two separate systems for find the location of clang_rt (simplified): * `getCompilerRTPath()`, e.g. `../lib/clang/22/lib/windows`, used when `LLVM_ENABLE_PER_TARGET_RUNTIME_DIR=0` * `getRuntimePath()`, e.g. `../lib/clang/22/lib/x86_64-pc-windows-msvc`, used when `LLVM_ENABLE_PER_TARGET_RUNTIME_DIR=1` To simplify the search path, Flang-RT normally assumes only `getRuntimePath()`, i.e. ignoring `LLVM_ENABLE_PER_TARGET_RUNTIME_DIR` and always using the `LLVM_ENABLE_PER_TARGET_RUNTIME_DIR=1` mechanism. There is an exception for Apple Darwin triples where `getRuntimePath()` returns nothing. The flang-rt/compiler-rt CMake code for library location also ignores `LLVM_ENABLE_PER_TARGET_RUNTIME_DIR` but uses the `LLVM_ENABLE_PER_TARGET_RUNTIME_DIR=0` path instead. Since only `getRuntimePath()` is automatically added to the linker command line, this patch explicitly adds `getCompilerRTPath()` to the path when linking flang_rt. Fixes #151031 (cherry picked from commit 8de481913353a1e37264687d5cc73db0de19e6cc) --- clang/lib/Driver/ToolChain.cpp | 29 + 1 file changed, 21 insertions(+), 8 deletions(-) diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp index 3f9b808b2722e..07a3ae925f96d 100644 --- a/clang/lib/Driver/ToolChain.cpp +++ b/clang/lib/Driver/ToolChain.cpp @@ -837,17 +837,30 @@ void ToolChain::addFortranRuntimeLibs(const ArgList &Args, void ToolChain::addFortranRuntimeLibraryPath(const llvm::opt::ArgList &Args, ArgStringList &CmdArgs) const { - // Default to the /../lib directory. This works fine on the - // platforms that we have tested so far. We will probably have to re-fine - // this in the future. In particular, on some platforms, we may need to use - // lib64 instead of lib. + auto AddLibSearchPathIfExists = [&](const Twine &Path) { +// Linker may emit warnings about non-existing directories +if (!llvm::sys::fs::is_directory(Path)) + return; + +if (getTriple().isKnownWindowsMSVCEnvironment()) + CmdArgs.push_back(Args.MakeArgString("-libpath:" + Path)); +else + CmdArgs.push_back(Args.MakeArgString("-L" + Path)); + }; + + // Search for flang_rt.* at the same location as clang_rt.* with + // LLVM_ENABLE_PER_TARGET_RUNTIME_DIR=0. On most platforms, flang_rt is + // located at the path returned by getRuntimePath() which is already added to + // the library search path. This exception is for Apple-Darwin. + AddLibSearchPathIfExists(getCompilerRTPath()); + + // Fall back to the non-resource directory /../lib. We will + // probably have to refine this in the future. In particular, on some + // platforms, we may need to use lib64 instead of lib. SmallString<256> DefaultLibPath = llvm::sys::path::parent_path(getDriver().Dir); llvm::sys::path::append(DefaultLibPath, "lib"); - if (getTriple().isKnownWindowsMSVCEnvironment()) -CmdArgs.push_back(Args.MakeArgString("-libpath:" + DefaultLibPath)); - else -CmdArgs.push_back(Args.MakeArgString("-L" + DefaultLibPath)); + AddLibSearchPathIfExists(DefaultLibPath); } void ToolChain::addFlangRTLibPath(const ArgList &Args, ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/21.x: [Flang] Search flang_rt in clang_rt path (#151954) (PR #152458)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/152458 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/21.x: [Flang] Search flang_rt in clang_rt path (#151954) (PR #152458)
llvmbot wrote: @carlocab What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/152458 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/21.x: [Flang] Search flang_rt in clang_rt path (#151954) (PR #152458)
https://github.com/carlocab approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/152458 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/21.x: [Flang] Search flang_rt in clang_rt path (#151954) (PR #152458)
carlocab wrote: Probably needs merged by a release manager? https://github.com/llvm/llvm-project/pull/152458 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [mlir] [OpenMP][OMPIRBuilder] Use device shared memory for arg structures (PR #150925)
Meinersbur wrote: > Having said that callbacks are all over the place in `OMPIRBuilder`. There is term for it: [Callback hell](https://en.wiktionary.org/wiki/callback_hell) https://github.com/llvm/llvm-project/pull/150925 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/21.x: [Flang] Search flang_rt in clang_rt path (#151954) (PR #152458)
Meinersbur wrote: > Probably needs merged by a release manager? Yes, the release manager's workflow is detailed here: https://llvm.org/docs/HowToReleaseLLVM.html#triaging-bug-reports-for-releases https://github.com/llvm/llvm-project/pull/152458 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [llvm] [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (PR #151989)
@@ -0,0 +1,171 @@ +//===- AutomapToTargetData.cpp ---===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// + +#include "flang/Optimizer/Builder/DirectivesCommon.h" +#include "flang/Optimizer/Builder/FIRBuilder.h" +#include "flang/Optimizer/Builder/HLFIRTools.h" +#include "flang/Optimizer/Dialect/FIROps.h" +#include "flang/Optimizer/Dialect/FIRType.h" +#include "flang/Optimizer/Dialect/Support/KindMapping.h" +#include "flang/Optimizer/HLFIR/HLFIROps.h" +#include "mlir/IR/BuiltinAttributes.h" +#include "mlir/Pass/Pass.h" +#include "llvm/Frontend/OpenMP/OMPConstants.h" +#include +#include + +namespace flangomp { +#define GEN_PASS_DEF_AUTOMAPTOTARGETDATAPASS +#include "flang/Optimizer/OpenMP/Passes.h.inc" +} // namespace flangomp + +using namespace mlir; + +namespace { +class AutomapToTargetDataPass +: public flangomp::impl::AutomapToTargetDataPassBase< + AutomapToTargetDataPass> { + // Returns true if the variable has a dynamic size and therefore requires + // bounds operations to describe its extents. + bool needsBoundsOps(Value var) { +assert(isa(var.getType()) && + "only pointer like types expected"); +Type t = fir::unwrapRefType(var.getType()); +if (Type inner = fir::dyn_cast_ptrOrBoxEleTy(t)) + return fir::hasDynamicSize(inner); +return fir::hasDynamicSize(t); + } + + // Generate MapBoundsOp operations for the variable if required. + void genBoundsOps(fir::FirOpBuilder &builder, Value var, +SmallVectorImpl &boundsOps) { +Location loc = var.getLoc(); +fir::factory::AddrAndBoundsInfo info = +fir::factory::getDataOperandBaseAddr(builder, var, + /*isOptional=*/false, loc); +fir::ExtendedValue exv = +hlfir::translateToExtendedValue(loc, builder, hlfir::Entity{info.addr}, +/*contiguousHint=*/true) +.first; +SmallVector tmp = +fir::factory::genImplicitBoundsOps( +builder, info, exv, /*dataExvIsAssumedSize=*/false, loc); +llvm::append_range(boundsOps, tmp); + } + + void findRelatedAllocmemFreemem(fir::AddrOfOp addressOfOp, + llvm::SmallVector &allocmems, + llvm::SmallVector &freemems) { +assert(addressOfOp->hasOneUse() && "op must have single use"); + +auto declaredRef = +cast(*addressOfOp->getUsers().begin())->getResult(0); + +for (Operation *refUser : declaredRef.getUsers()) { + if (auto storeOp = dyn_cast(refUser)) +if (auto emboxOp = storeOp.getValue().getDefiningOp()) + if (auto allocmemOp = + emboxOp.getOperand(0).getDefiningOp()) +allocmems.push_back(storeOp); + + if (auto loadOp = dyn_cast(refUser)) +for (Operation *loadUser : loadOp.getResult().getUsers()) + if (auto boxAddrOp = dyn_cast(loadUser)) +for (Operation *boxAddrUser : boxAddrOp.getResult().getUsers()) + if (auto freememOp = dyn_cast(boxAddrUser)) +freemems.push_back(loadOp); +} + } + + void runOnOperation() override { +ModuleOp module = getOperation()->getParentOfType(); +if (!module) + module = dyn_cast(getOperation()); +if (!module) + return; + +// Build FIR builder for helper utilities. +fir::KindMapping kindMap = fir::getKindMapping(module); +fir::FirOpBuilder builder{module, std::move(kindMap)}; + +// Collect global variables with AUTOMAP flag. +llvm::DenseSet automapGlobals; +module.walk([&](fir::GlobalOp globalOp) { + if (auto iface = + dyn_cast(globalOp.getOperation())) +if (iface.isDeclareTarget() && iface.getDeclareTargetAutomap()) + automapGlobals.insert(globalOp); +}); + +for (fir::GlobalOp globalOp : automapGlobals) + if (auto uses = globalOp.getSymbolUses(module.getOperation())) +for (auto &x : *uses) + if (auto addrOp = dyn_cast(x.getUser())) { +llvm::SmallVector allocstores; +llvm::SmallVector freememloads; +findRelatedAllocmemFreemem(addrOp, allocstores, freememloads); + +for (auto storeOp : allocstores) { skatrak wrote: There's quite some code duplication between these two loops. I think it's worth refactoring into a lambda or template function. ```c++ auto processTargetDataClauses = [&](auto op, llvm::omp::OpenMPOffloadMappingFlags flags) -> omp::TargetEnterExitUpdateDataOperands { ... }; for (auto storeOp : allocmemStores) { auto clauses = processLoadStore(storeOp); builder.create(storeOp.getLoc(), clauses); } for (au
[llvm-branch-commits] [flang] [llvm] [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (PR #151989)
@@ -0,0 +1,171 @@ +//===- AutomapToTargetData.cpp ---===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// + +#include "flang/Optimizer/Builder/DirectivesCommon.h" +#include "flang/Optimizer/Builder/FIRBuilder.h" +#include "flang/Optimizer/Builder/HLFIRTools.h" +#include "flang/Optimizer/Dialect/FIROps.h" +#include "flang/Optimizer/Dialect/FIRType.h" +#include "flang/Optimizer/Dialect/Support/KindMapping.h" +#include "flang/Optimizer/HLFIR/HLFIROps.h" +#include "mlir/IR/BuiltinAttributes.h" +#include "mlir/Pass/Pass.h" +#include "llvm/Frontend/OpenMP/OMPConstants.h" +#include +#include + +namespace flangomp { +#define GEN_PASS_DEF_AUTOMAPTOTARGETDATAPASS +#include "flang/Optimizer/OpenMP/Passes.h.inc" +} // namespace flangomp + +using namespace mlir; + +namespace { +class AutomapToTargetDataPass +: public flangomp::impl::AutomapToTargetDataPassBase< + AutomapToTargetDataPass> { + // Returns true if the variable has a dynamic size and therefore requires + // bounds operations to describe its extents. + bool needsBoundsOps(Value var) { +assert(isa(var.getType()) && + "only pointer like types expected"); +Type t = fir::unwrapRefType(var.getType()); +if (Type inner = fir::dyn_cast_ptrOrBoxEleTy(t)) + return fir::hasDynamicSize(inner); +return fir::hasDynamicSize(t); + } + + // Generate MapBoundsOp operations for the variable if required. + void genBoundsOps(fir::FirOpBuilder &builder, Value var, +SmallVectorImpl &boundsOps) { +Location loc = var.getLoc(); +fir::factory::AddrAndBoundsInfo info = +fir::factory::getDataOperandBaseAddr(builder, var, + /*isOptional=*/false, loc); +fir::ExtendedValue exv = +hlfir::translateToExtendedValue(loc, builder, hlfir::Entity{info.addr}, +/*contiguousHint=*/true) +.first; +SmallVector tmp = +fir::factory::genImplicitBoundsOps( +builder, info, exv, /*dataExvIsAssumedSize=*/false, loc); +llvm::append_range(boundsOps, tmp); + } + + void findRelatedAllocmemFreemem(fir::AddrOfOp addressOfOp, + llvm::SmallVector &allocmems, + llvm::SmallVector &freemems) { +assert(addressOfOp->hasOneUse() && "op must have single use"); + +auto declaredRef = +cast(*addressOfOp->getUsers().begin())->getResult(0); + +for (Operation *refUser : declaredRef.getUsers()) { + if (auto storeOp = dyn_cast(refUser)) +if (auto emboxOp = storeOp.getValue().getDefiningOp()) + if (auto allocmemOp = + emboxOp.getOperand(0).getDefiningOp()) +allocmems.push_back(storeOp); + + if (auto loadOp = dyn_cast(refUser)) +for (Operation *loadUser : loadOp.getResult().getUsers()) + if (auto boxAddrOp = dyn_cast(loadUser)) +for (Operation *boxAddrUser : boxAddrOp.getResult().getUsers()) + if (auto freememOp = dyn_cast(boxAddrUser)) +freemems.push_back(loadOp); +} + } + + void runOnOperation() override { +ModuleOp module = getOperation()->getParentOfType(); +if (!module) + module = dyn_cast(getOperation()); +if (!module) + return; + +// Build FIR builder for helper utilities. +fir::KindMapping kindMap = fir::getKindMapping(module); +fir::FirOpBuilder builder{module, std::move(kindMap)}; + +// Collect global variables with AUTOMAP flag. +llvm::DenseSet automapGlobals; +module.walk([&](fir::GlobalOp globalOp) { + if (auto iface = + dyn_cast(globalOp.getOperation())) +if (iface.isDeclareTarget() && iface.getDeclareTargetAutomap()) + automapGlobals.insert(globalOp); +}); + +for (fir::GlobalOp globalOp : automapGlobals) + if (auto uses = globalOp.getSymbolUses(module.getOperation())) +for (auto &x : *uses) + if (auto addrOp = dyn_cast(x.getUser())) { +llvm::SmallVector allocstores; +llvm::SmallVector freememloads; +findRelatedAllocmemFreemem(addrOp, allocstores, freememloads); skatrak wrote: Would it be possible to first gather all stores and loads for all uses and then process them? That way we wouldn't have to allocate/deallocate these lists for each use. Something like: ```c++ for (fir::GlobalOp globalOp : automapGlobals) { if (auto uses = globalOp.getSymbolUses(module.getOperation())) { llvm::SmallVector allocmemStores; llvm::SmallVector freememLoads; for (auto &x : *uses) if (auto addrOp = dyn_cast(x.getUser())) fi
[llvm-branch-commits] [flang] [llvm] [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (PR #151989)
@@ -0,0 +1,171 @@ +//===- AutomapToTargetData.cpp ---===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// + +#include "flang/Optimizer/Builder/DirectivesCommon.h" +#include "flang/Optimizer/Builder/FIRBuilder.h" +#include "flang/Optimizer/Builder/HLFIRTools.h" +#include "flang/Optimizer/Dialect/FIROps.h" +#include "flang/Optimizer/Dialect/FIRType.h" +#include "flang/Optimizer/Dialect/Support/KindMapping.h" +#include "flang/Optimizer/HLFIR/HLFIROps.h" +#include "mlir/IR/BuiltinAttributes.h" +#include "mlir/Pass/Pass.h" +#include "llvm/Frontend/OpenMP/OMPConstants.h" +#include +#include + +namespace flangomp { +#define GEN_PASS_DEF_AUTOMAPTOTARGETDATAPASS +#include "flang/Optimizer/OpenMP/Passes.h.inc" +} // namespace flangomp + +using namespace mlir; + +namespace { +class AutomapToTargetDataPass +: public flangomp::impl::AutomapToTargetDataPassBase< + AutomapToTargetDataPass> { + // Returns true if the variable has a dynamic size and therefore requires + // bounds operations to describe its extents. + bool needsBoundsOps(Value var) { +assert(isa(var.getType()) && + "only pointer like types expected"); +Type t = fir::unwrapRefType(var.getType()); +if (Type inner = fir::dyn_cast_ptrOrBoxEleTy(t)) + return fir::hasDynamicSize(inner); +return fir::hasDynamicSize(t); + } + + // Generate MapBoundsOp operations for the variable if required. + void genBoundsOps(fir::FirOpBuilder &builder, Value var, +SmallVectorImpl &boundsOps) { +Location loc = var.getLoc(); +fir::factory::AddrAndBoundsInfo info = +fir::factory::getDataOperandBaseAddr(builder, var, + /*isOptional=*/false, loc); +fir::ExtendedValue exv = +hlfir::translateToExtendedValue(loc, builder, hlfir::Entity{info.addr}, +/*contiguousHint=*/true) +.first; +SmallVector tmp = +fir::factory::genImplicitBoundsOps( +builder, info, exv, /*dataExvIsAssumedSize=*/false, loc); +llvm::append_range(boundsOps, tmp); + } + + void findRelatedAllocmemFreemem(fir::AddrOfOp addressOfOp, + llvm::SmallVector &allocmems, + llvm::SmallVector &freemems) { +assert(addressOfOp->hasOneUse() && "op must have single use"); + +auto declaredRef = +cast(*addressOfOp->getUsers().begin())->getResult(0); + +for (Operation *refUser : declaredRef.getUsers()) { + if (auto storeOp = dyn_cast(refUser)) +if (auto emboxOp = storeOp.getValue().getDefiningOp()) + if (auto allocmemOp = + emboxOp.getOperand(0).getDefiningOp()) +allocmems.push_back(storeOp); + + if (auto loadOp = dyn_cast(refUser)) +for (Operation *loadUser : loadOp.getResult().getUsers()) + if (auto boxAddrOp = dyn_cast(loadUser)) +for (Operation *boxAddrUser : boxAddrOp.getResult().getUsers()) + if (auto freememOp = dyn_cast(boxAddrUser)) +freemems.push_back(loadOp); +} + } + + void runOnOperation() override { +ModuleOp module = getOperation()->getParentOfType(); +if (!module) + module = dyn_cast(getOperation()); +if (!module) + return; + +// Build FIR builder for helper utilities. +fir::KindMapping kindMap = fir::getKindMapping(module); +fir::FirOpBuilder builder{module, std::move(kindMap)}; + +// Collect global variables with AUTOMAP flag. +llvm::DenseSet automapGlobals; +module.walk([&](fir::GlobalOp globalOp) { + if (auto iface = + dyn_cast(globalOp.getOperation())) +if (iface.isDeclareTarget() && iface.getDeclareTargetAutomap()) skatrak wrote: I think this is missing a check for the declare target type: `iface.getDeclareTargetDeviceType()`. Otherwise, this results in mapping `declare target device_type(host)` globals. https://github.com/llvm/llvm-project/pull/151989 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [llvm] [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (PR #151989)
@@ -0,0 +1,171 @@ +//===- AutomapToTargetData.cpp ---===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// + +#include "flang/Optimizer/Builder/DirectivesCommon.h" +#include "flang/Optimizer/Builder/FIRBuilder.h" +#include "flang/Optimizer/Builder/HLFIRTools.h" +#include "flang/Optimizer/Dialect/FIROps.h" +#include "flang/Optimizer/Dialect/FIRType.h" +#include "flang/Optimizer/Dialect/Support/KindMapping.h" +#include "flang/Optimizer/HLFIR/HLFIROps.h" +#include "mlir/IR/BuiltinAttributes.h" +#include "mlir/Pass/Pass.h" +#include "llvm/Frontend/OpenMP/OMPConstants.h" +#include +#include + +namespace flangomp { +#define GEN_PASS_DEF_AUTOMAPTOTARGETDATAPASS +#include "flang/Optimizer/OpenMP/Passes.h.inc" +} // namespace flangomp + +using namespace mlir; + +namespace { +class AutomapToTargetDataPass +: public flangomp::impl::AutomapToTargetDataPassBase< + AutomapToTargetDataPass> { + // Returns true if the variable has a dynamic size and therefore requires + // bounds operations to describe its extents. + bool needsBoundsOps(Value var) { +assert(isa(var.getType()) && + "only pointer like types expected"); +Type t = fir::unwrapRefType(var.getType()); +if (Type inner = fir::dyn_cast_ptrOrBoxEleTy(t)) + return fir::hasDynamicSize(inner); +return fir::hasDynamicSize(t); + } + + // Generate MapBoundsOp operations for the variable if required. + void genBoundsOps(fir::FirOpBuilder &builder, Value var, +SmallVectorImpl &boundsOps) { +Location loc = var.getLoc(); +fir::factory::AddrAndBoundsInfo info = +fir::factory::getDataOperandBaseAddr(builder, var, + /*isOptional=*/false, loc); +fir::ExtendedValue exv = +hlfir::translateToExtendedValue(loc, builder, hlfir::Entity{info.addr}, +/*contiguousHint=*/true) +.first; +SmallVector tmp = +fir::factory::genImplicitBoundsOps( +builder, info, exv, /*dataExvIsAssumedSize=*/false, loc); +llvm::append_range(boundsOps, tmp); + } + + void findRelatedAllocmemFreemem(fir::AddrOfOp addressOfOp, + llvm::SmallVector &allocmems, + llvm::SmallVector &freemems) { +assert(addressOfOp->hasOneUse() && "op must have single use"); + +auto declaredRef = +cast(*addressOfOp->getUsers().begin())->getResult(0); + +for (Operation *refUser : declaredRef.getUsers()) { + if (auto storeOp = dyn_cast(refUser)) +if (auto emboxOp = storeOp.getValue().getDefiningOp()) + if (auto allocmemOp = + emboxOp.getOperand(0).getDefiningOp()) +allocmems.push_back(storeOp); + + if (auto loadOp = dyn_cast(refUser)) +for (Operation *loadUser : loadOp.getResult().getUsers()) + if (auto boxAddrOp = dyn_cast(loadUser)) +for (Operation *boxAddrUser : boxAddrOp.getResult().getUsers()) + if (auto freememOp = dyn_cast(boxAddrUser)) +freemems.push_back(loadOp); +} + } + + void runOnOperation() override { +ModuleOp module = getOperation()->getParentOfType(); +if (!module) + module = dyn_cast(getOperation()); +if (!module) + return; + +// Build FIR builder for helper utilities. +fir::KindMapping kindMap = fir::getKindMapping(module); +fir::FirOpBuilder builder{module, std::move(kindMap)}; + +// Collect global variables with AUTOMAP flag. +llvm::DenseSet automapGlobals; +module.walk([&](fir::GlobalOp globalOp) { + if (auto iface = + dyn_cast(globalOp.getOperation())) +if (iface.isDeclareTarget() && iface.getDeclareTargetAutomap()) + automapGlobals.insert(globalOp); +}); + +for (fir::GlobalOp globalOp : automapGlobals) + if (auto uses = globalOp.getSymbolUses(module.getOperation())) +for (auto &x : *uses) + if (auto addrOp = dyn_cast(x.getUser())) { +llvm::SmallVector allocstores; +llvm::SmallVector freememloads; skatrak wrote: ```suggestion llvm::SmallVector allocmemStores; llvm::SmallVector freememLoads; ``` https://github.com/llvm/llvm-project/pull/151989 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [llvm] [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (PR #151989)
@@ -0,0 +1,171 @@ +//===- AutomapToTargetData.cpp ---===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// + +#include "flang/Optimizer/Builder/DirectivesCommon.h" +#include "flang/Optimizer/Builder/FIRBuilder.h" +#include "flang/Optimizer/Builder/HLFIRTools.h" +#include "flang/Optimizer/Dialect/FIROps.h" +#include "flang/Optimizer/Dialect/FIRType.h" +#include "flang/Optimizer/Dialect/Support/KindMapping.h" +#include "flang/Optimizer/HLFIR/HLFIROps.h" +#include "mlir/IR/BuiltinAttributes.h" +#include "mlir/Pass/Pass.h" +#include "llvm/Frontend/OpenMP/OMPConstants.h" +#include +#include + +namespace flangomp { +#define GEN_PASS_DEF_AUTOMAPTOTARGETDATAPASS +#include "flang/Optimizer/OpenMP/Passes.h.inc" +} // namespace flangomp + +using namespace mlir; + +namespace { +class AutomapToTargetDataPass +: public flangomp::impl::AutomapToTargetDataPassBase< + AutomapToTargetDataPass> { + // Returns true if the variable has a dynamic size and therefore requires + // bounds operations to describe its extents. + bool needsBoundsOps(Value var) { skatrak wrote: I agree. Maybe flang/include/flang/Support/OpenMP-utils.h and flang/lib/Support/OpenMP-utils.cpp could be where this logic can be shared between lowering and this pass. https://github.com/llvm/llvm-project/pull/151989 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [llvm] [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (PR #151989)
@@ -0,0 +1,40 @@ +// RUN: fir-opt --omp-automap-to-target-data %s | FileCheck %s +// Test OMP AutomapToTargetData pass. + +module { + fir.global + @_QMtestEarr{omp.declare_target = #omp.declaretarget} target + : !fir.box>> + + func.func @automap() { +%c0 = arith.constant 0 : index +%c10 = arith.constant 10 : i32 +%addr = fir.address_of(@_QMtestEarr) : !fir.ref>>> +%decl:2 = hlfir.declare %addr {fortran_attrs = #fir.var_attrs, uniq_name = "_QMtestEarr"} : (!fir.ref>>>) -> (!fir.ref>>>, !fir.ref>>>) +%idx = fir.convert %c10 : (i32) -> index +%cond = arith.cmpi sgt, %idx, %c0 : index +%n = arith.select %cond, %idx, %c0 : index +%mem = fir.allocmem !fir.array, %n {fir.must_be_heap = true} +%shape = fir.shape %n : (index) -> !fir.shape<1> +%box = fir.embox %mem(%shape) : (!fir.heap>, !fir.shape<1>) -> !fir.box>> +fir.store %box to %decl#0 : !fir.ref>>> +%ld = fir.load %decl#0 : !fir.ref>>> +%base = fir.box_addr %ld : (!fir.box>>) -> !fir.heap> +fir.freemem %base : !fir.heap> +%undef = fir.zero_bits !fir.heap> +%sh0 = fir.shape %c0 : (index) -> !fir.shape<1> +%empty = fir.embox %undef(%sh0) : (!fir.heap>, !fir.shape<1>) -> !fir.box>> +fir.store %empty to %decl#0 : !fir.ref>>> +return + } +} + +// CHECK-LABEL: func.func @automap() +// CHECK: fir.allocmem +// CHECK: fir.store +// CHECK: omp.map.info {{.*}}map_clauses(to) +// CHECK: omp.target_enter_data +// CHECK: omp.map.info {{.*}}map_clauses(delete) +// CHECK: omp.target_exit_data +// CHECK: fir.freemem skatrak wrote: Nit: I think we should also test how values defined by these ops are passed to the other ops, not just checking that the expected ops are there. Also it would be good to check that uses of the global variable are placed between the `omp.target_enter_data` and `omp.target_exit_data`. https://github.com/llvm/llvm-project/pull/151989 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [llvm] [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (PR #151989)
@@ -316,13 +316,13 @@ void createOpenMPFIRPassPipeline(mlir::PassManager &pm, pm.addPass(flangomp::createDoConcurrentConversionPass( opts.doConcurrentMappingKind == DoConcurrentMappingKind::DCMK_Device)); - // The MapsForPrivatizedSymbols pass needs to run before - // MapInfoFinalizationPass because the former creates new - // MapInfoOp instances, typically for descriptors. - // MapInfoFinalizationPass adds MapInfoOp instances for the descriptors - // underlying data which is necessary to access the data on the offload - // target device. + // The MapsForPrivatizedSymbols and AutomapToTargetDataPass pass needs to run + // before MapInfoFinalizationPass because the former creates new MapInfoOp skatrak wrote: ```suggestion // before MapInfoFinalizationPass because they create new MapInfoOp ``` https://github.com/llvm/llvm-project/pull/151989 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [llvm] [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (PR #151989)
@@ -0,0 +1,171 @@ +//===- AutomapToTargetData.cpp ---===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// + +#include "flang/Optimizer/Builder/DirectivesCommon.h" +#include "flang/Optimizer/Builder/FIRBuilder.h" +#include "flang/Optimizer/Builder/HLFIRTools.h" +#include "flang/Optimizer/Dialect/FIROps.h" +#include "flang/Optimizer/Dialect/FIRType.h" +#include "flang/Optimizer/Dialect/Support/KindMapping.h" +#include "flang/Optimizer/HLFIR/HLFIROps.h" +#include "mlir/IR/BuiltinAttributes.h" +#include "mlir/Pass/Pass.h" +#include "llvm/Frontend/OpenMP/OMPConstants.h" +#include +#include + +namespace flangomp { +#define GEN_PASS_DEF_AUTOMAPTOTARGETDATAPASS +#include "flang/Optimizer/OpenMP/Passes.h.inc" +} // namespace flangomp + +using namespace mlir; + +namespace { +class AutomapToTargetDataPass +: public flangomp::impl::AutomapToTargetDataPassBase< + AutomapToTargetDataPass> { + // Returns true if the variable has a dynamic size and therefore requires + // bounds operations to describe its extents. + bool needsBoundsOps(Value var) { +assert(isa(var.getType()) && + "only pointer like types expected"); +Type t = fir::unwrapRefType(var.getType()); +if (Type inner = fir::dyn_cast_ptrOrBoxEleTy(t)) + return fir::hasDynamicSize(inner); +return fir::hasDynamicSize(t); + } + + // Generate MapBoundsOp operations for the variable if required. + void genBoundsOps(fir::FirOpBuilder &builder, Value var, +SmallVectorImpl &boundsOps) { +Location loc = var.getLoc(); +fir::factory::AddrAndBoundsInfo info = +fir::factory::getDataOperandBaseAddr(builder, var, + /*isOptional=*/false, loc); +fir::ExtendedValue exv = +hlfir::translateToExtendedValue(loc, builder, hlfir::Entity{info.addr}, +/*contiguousHint=*/true) +.first; +SmallVector tmp = +fir::factory::genImplicitBoundsOps( +builder, info, exv, /*dataExvIsAssumedSize=*/false, loc); +llvm::append_range(boundsOps, tmp); + } + + void findRelatedAllocmemFreemem(fir::AddrOfOp addressOfOp, + llvm::SmallVector &allocmems, + llvm::SmallVector &freemems) { +assert(addressOfOp->hasOneUse() && "op must have single use"); + +auto declaredRef = +cast(*addressOfOp->getUsers().begin())->getResult(0); + +for (Operation *refUser : declaredRef.getUsers()) { + if (auto storeOp = dyn_cast(refUser)) +if (auto emboxOp = storeOp.getValue().getDefiningOp()) + if (auto allocmemOp = + emboxOp.getOperand(0).getDefiningOp()) +allocmems.push_back(storeOp); + + if (auto loadOp = dyn_cast(refUser)) +for (Operation *loadUser : loadOp.getResult().getUsers()) + if (auto boxAddrOp = dyn_cast(loadUser)) +for (Operation *boxAddrUser : boxAddrOp.getResult().getUsers()) + if (auto freememOp = dyn_cast(boxAddrUser)) +freemems.push_back(loadOp); +} + } + + void runOnOperation() override { +ModuleOp module = getOperation()->getParentOfType(); +if (!module) + module = dyn_cast(getOperation()); +if (!module) + return; + +// Build FIR builder for helper utilities. +fir::KindMapping kindMap = fir::getKindMapping(module); +fir::FirOpBuilder builder{module, std::move(kindMap)}; + +// Collect global variables with AUTOMAP flag. +llvm::DenseSet automapGlobals; +module.walk([&](fir::GlobalOp globalOp) { + if (auto iface = + dyn_cast(globalOp.getOperation())) +if (iface.isDeclareTarget() && iface.getDeclareTargetAutomap()) + automapGlobals.insert(globalOp); +}); + +for (fir::GlobalOp globalOp : automapGlobals) + if (auto uses = globalOp.getSymbolUses(module.getOperation())) +for (auto &x : *uses) skatrak wrote: Nit: Use braces here: [link](https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements). > Similarly, braces should be used when a single-statement body is complex > enough that it becomes difficult to see where the block containing the > following statement began. https://github.com/llvm/llvm-project/pull/151989 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [llvm] [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (PR #151989)
@@ -316,13 +316,13 @@ void createOpenMPFIRPassPipeline(mlir::PassManager &pm, pm.addPass(flangomp::createDoConcurrentConversionPass( opts.doConcurrentMappingKind == DoConcurrentMappingKind::DCMK_Device)); - // The MapsForPrivatizedSymbols pass needs to run before - // MapInfoFinalizationPass because the former creates new - // MapInfoOp instances, typically for descriptors. - // MapInfoFinalizationPass adds MapInfoOp instances for the descriptors - // underlying data which is necessary to access the data on the offload - // target device. + // The MapsForPrivatizedSymbols and AutomapToTargetDataPass pass needs to run skatrak wrote: ```suggestion // The MapsForPrivatizedSymbols and AutomapToTargetDataPass pass need to run ``` https://github.com/llvm/llvm-project/pull/151989 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [llvm] [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (PR #151989)
https://github.com/skatrak commented: Thank you Akash, a couple of minor comments from me. https://github.com/llvm/llvm-project/pull/151989 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [llvm] [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (PR #151989)
https://github.com/skatrak edited https://github.com/llvm/llvm-project/pull/151989 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/21.x: [flang-rt] Use correct flang-rt build for flang-rt unit tests on Windows (#152318) (PR #152493)
https://github.com/Meinersbur approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/152493 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][ISel] Extend vector_splice tests (NFC) (PR #152553)
https://github.com/gbossu edited https://github.com/llvm/llvm-project/pull/152553 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][ISel] Extend vector_splice tests (NFC) (PR #152553)
@@ -0,0 +1,162 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py +; RUN: llc -mattr=+sve -verify-machineinstrs < %s | FileCheck %s +; RUN: llc -mattr=+sve2 -verify-machineinstrs < %s | FileCheck %s + +target triple = "aarch64-unknown-linux-gnu" + +; Test vector_splice patterns. +; Note that this test is similar to named-vector-shuffles-sve.ll, but it focuses +; on testing all supported types, and a positive "splice index". + + +; i8 elements +define @splice_nxv16i8( %a, %b) { +; CHECK-LABEL: splice_nxv16i8: +; CHECK: // %bb.0: +; CHECK-NEXT:ext z0.b, z0.b, z1.b, #1 +; CHECK-NEXT:ret + %res = call @llvm.vector.splice.nxv16i8( %a, %b, i32 1) + ret %res +} + +; i16 elements +define @splice_nxv8i16( %a, %b) { +; CHECK-LABEL: splice_nxv8i16: +; CHECK: // %bb.0: +; CHECK-NEXT:ext z0.b, z0.b, z1.b, #2 +; CHECK-NEXT:ret + %res = call @llvm.vector.splice.nxv8i16( %a, %b, i32 1) + ret %res +} + +; bf16 elements + +define @splice_nxv8bfloat( %a, %b) { +; CHECK-LABEL: splice_nxv8bfloat: +; CHECK: // %bb.0: +; CHECK-NEXT:ext z0.b, z0.b, z1.b, #2 +; CHECK-NEXT:ret + %res = call @llvm.vector.splice.nxv8bfloat( %a, %b, i32 1) + ret %res +} + +define @splice_nxv4bfloat( %a, %b) { +; CHECK-LABEL: splice_nxv4bfloat: +; CHECK: // %bb.0: +; CHECK-NEXT:ext z0.b, z0.b, z1.b, #4 +; CHECK-NEXT:ret + %res = call @llvm.vector.splice.nxv4bfloat( %a, %b, i32 1) + ret %res +} gbossu wrote: ⚠️ Similar to what I had metionned in a closed PR: https://github.com/llvm/llvm-project/pull/151730#discussion_r2248448988 We have patterns for `EXT_ZZI` with these "weird" types where the fixed part isn't 128-bit: - - - - - I'm not sure why they were here in the first place, and looking at the generated code, I think the patterns are wrong. https://github.com/llvm/llvm-project/pull/152553 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][ISel] Select constructive EXT_ZZZI pseudo instruction (PR #152554)
@@ -150,13 +150,14 @@ define void @fcvtzu_v16f16_v16i32(ptr %a, ptr %b) #0 { ; VBITS_GE_256-NEXT:mov x8, #8 // =0x8 ; VBITS_GE_256-NEXT:ld1h { z0.h }, p0/z, [x0] ; VBITS_GE_256-NEXT:ptrue p0.s, vl8 -; VBITS_GE_256-NEXT:uunpklo z1.s, z0.h -; VBITS_GE_256-NEXT:ext z0.b, z0.b, z0.b, #16 +; VBITS_GE_256-NEXT:movprfx z1, z0 +; VBITS_GE_256-NEXT:ext z1.b, z1.b, z0.b, #16 ; VBITS_GE_256-NEXT:uunpklo z0.s, z0.h -; VBITS_GE_256-NEXT:fcvtzu z1.s, p0/m, z1.h +; VBITS_GE_256-NEXT:uunpklo z1.s, z1.h ; VBITS_GE_256-NEXT:fcvtzu z0.s, p0/m, z0.h -; VBITS_GE_256-NEXT:st1w { z1.s }, p0, [x1] -; VBITS_GE_256-NEXT:st1w { z0.s }, p0, [x1, x8, lsl #2] +; VBITS_GE_256-NEXT:fcvtzu z1.s, p0/m, z1.h +; VBITS_GE_256-NEXT:st1w { z0.s }, p0, [x1] +; VBITS_GE_256-NEXT:st1w { z1.s }, p0, [x1, x8, lsl #2] gbossu wrote: In that example, we do get one more instruction now (the `movprfx`), but I think the schedule is actually better because we eliminate one dependency between `ext` and the second `uunpklo`. Now the two `uunpklo` can execute in parallel. This is is the theme of the test updates in general: Sometimes more instructions, but more freedom for the `MachineScheduler` https://github.com/llvm/llvm-project/pull/152554 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [llvm] [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (PR #151989)
https://github.com/TIFitis updated https://github.com/llvm/llvm-project/pull/151989 >From e9b6766c5fbfd25b5acfc686cbdc41f8dd727b03 Mon Sep 17 00:00:00 2001 From: Akash Banerjee Date: Thu, 31 Jul 2025 19:48:15 +0100 Subject: [PATCH 1/2] [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR Add a new AutomapToTargetData pass. This gathers the declare target enter variables which have the AUTOMAP modifier. And adds omp.declare_target_enter/exit mapping directives for fir.alloca and fir.free oeprations on the AUTOMAP enabled variables. --- .../include/flang/Optimizer/OpenMP/Passes.td | 11 ++ .../Optimizer/OpenMP/AutomapToTargetData.cpp | 171 ++ flang/lib/Optimizer/OpenMP/CMakeLists.txt | 1 + flang/lib/Optimizer/Passes/Pipelines.cpp | 12 +- .../Transforms/omp-automap-to-target-data.fir | 40 .../fortran/declare-target-automap.f90| 36 6 files changed, 265 insertions(+), 6 deletions(-) create mode 100644 flang/lib/Optimizer/OpenMP/AutomapToTargetData.cpp create mode 100644 flang/test/Transforms/omp-automap-to-target-data.fir create mode 100644 offload/test/offloading/fortran/declare-target-automap.f90 diff --git a/flang/include/flang/Optimizer/OpenMP/Passes.td b/flang/include/flang/Optimizer/OpenMP/Passes.td index 704faf0ccd856..0bff58f0f6394 100644 --- a/flang/include/flang/Optimizer/OpenMP/Passes.td +++ b/flang/include/flang/Optimizer/OpenMP/Passes.td @@ -112,4 +112,15 @@ def GenericLoopConversionPass ]; } +def AutomapToTargetDataPass +: Pass<"omp-automap-to-target-data", "::mlir::ModuleOp"> { + let summary = "Insert OpenMP target data operations for AUTOMAP variables"; + let description = [{ +Inserts `omp.target_enter_data` and `omp.target_exit_data` operations to +map variables marked with the `AUTOMAP` modifier when their allocation +or deallocation is detected in the FIR. + }]; + let dependentDialects = ["mlir::omp::OpenMPDialect"]; +} + #endif //FORTRAN_OPTIMIZER_OPENMP_PASSES diff --git a/flang/lib/Optimizer/OpenMP/AutomapToTargetData.cpp b/flang/lib/Optimizer/OpenMP/AutomapToTargetData.cpp new file mode 100644 index 0..c4937f1e90ee3 --- /dev/null +++ b/flang/lib/Optimizer/OpenMP/AutomapToTargetData.cpp @@ -0,0 +1,171 @@ +//===- AutomapToTargetData.cpp ---===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// + +#include "flang/Optimizer/Builder/DirectivesCommon.h" +#include "flang/Optimizer/Builder/FIRBuilder.h" +#include "flang/Optimizer/Builder/HLFIRTools.h" +#include "flang/Optimizer/Dialect/FIROps.h" +#include "flang/Optimizer/Dialect/FIRType.h" +#include "flang/Optimizer/Dialect/Support/KindMapping.h" +#include "flang/Optimizer/HLFIR/HLFIROps.h" +#include "mlir/IR/BuiltinAttributes.h" +#include "mlir/Pass/Pass.h" +#include "llvm/Frontend/OpenMP/OMPConstants.h" +#include +#include + +namespace flangomp { +#define GEN_PASS_DEF_AUTOMAPTOTARGETDATAPASS +#include "flang/Optimizer/OpenMP/Passes.h.inc" +} // namespace flangomp + +using namespace mlir; + +namespace { +class AutomapToTargetDataPass +: public flangomp::impl::AutomapToTargetDataPassBase< + AutomapToTargetDataPass> { + // Returns true if the variable has a dynamic size and therefore requires + // bounds operations to describe its extents. + bool needsBoundsOps(Value var) { +assert(isa(var.getType()) && + "only pointer like types expected"); +Type t = fir::unwrapRefType(var.getType()); +if (Type inner = fir::dyn_cast_ptrOrBoxEleTy(t)) + return fir::hasDynamicSize(inner); +return fir::hasDynamicSize(t); + } + + // Generate MapBoundsOp operations for the variable if required. + void genBoundsOps(fir::FirOpBuilder &builder, Value var, +SmallVectorImpl &boundsOps) { +Location loc = var.getLoc(); +fir::factory::AddrAndBoundsInfo info = +fir::factory::getDataOperandBaseAddr(builder, var, + /*isOptional=*/false, loc); +fir::ExtendedValue exv = +hlfir::translateToExtendedValue(loc, builder, hlfir::Entity{info.addr}, +/*contiguousHint=*/true) +.first; +SmallVector tmp = +fir::factory::genImplicitBoundsOps( +builder, info, exv, /*dataExvIsAssumedSize=*/false, loc); +llvm::append_range(boundsOps, tmp); + } + + void findRelatedAllocmemFreemem(fir::AddrOfOp addressOfOp, + llvm::SmallVector &allocmems, + llvm::SmallVector &freemems) { +assert(addressOfOp->hasOneUse() && "op must have single use"); + +auto declaredRef = +cast(*addressOfOp->getUsers().begin
[llvm-branch-commits] [llvm] [AArch64][ISel] Select constructive EXT_ZZZI pseudo instruction (PR #152554)
@@ -256,12 +256,13 @@ define @splice_nxv2f64_last_idx( %a, define @splice_nxv2i1_idx( %a, %b) #0 { ; CHECK-LABEL: splice_nxv2i1_idx: ; CHECK: // %bb.0: -; CHECK-NEXT:mov z0.d, p1/z, #1 // =0x1 ; CHECK-NEXT:mov z1.d, p0/z, #1 // =0x1 +; CHECK-NEXT:mov z0.d, p1/z, #1 // =0x1 ; CHECK-NEXT:ptrue p0.d -; CHECK-NEXT:ext z1.b, z1.b, z0.b, #8 -; CHECK-NEXT:and z1.d, z1.d, #0x1 -; CHECK-NEXT:cmpne p0.d, p0/z, z1.d, #0 +; CHECK-NEXT:mov z0.d, z1.d gbossu wrote: This is one case where we get worse due to an extra MOV that could not be turned into a MOVPRFX. THis is alleviated in the next commit using register hints. https://github.com/llvm/llvm-project/pull/152554 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [llvm] [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (PR #151989)
@@ -0,0 +1,40 @@ +// RUN: fir-opt --omp-automap-to-target-data %s | FileCheck %s +// Test OMP AutomapToTargetData pass. + +module { + fir.global + @_QMtestEarr{omp.declare_target = #omp.declaretarget} target + : !fir.box>> + + func.func @automap() { +%c0 = arith.constant 0 : index +%c10 = arith.constant 10 : i32 +%addr = fir.address_of(@_QMtestEarr) : !fir.ref>>> +%decl:2 = hlfir.declare %addr {fortran_attrs = #fir.var_attrs, uniq_name = "_QMtestEarr"} : (!fir.ref>>>) -> (!fir.ref>>>, !fir.ref>>>) +%idx = fir.convert %c10 : (i32) -> index +%cond = arith.cmpi sgt, %idx, %c0 : index +%n = arith.select %cond, %idx, %c0 : index +%mem = fir.allocmem !fir.array, %n {fir.must_be_heap = true} +%shape = fir.shape %n : (index) -> !fir.shape<1> +%box = fir.embox %mem(%shape) : (!fir.heap>, !fir.shape<1>) -> !fir.box>> +fir.store %box to %decl#0 : !fir.ref>>> +%ld = fir.load %decl#0 : !fir.ref>>> +%base = fir.box_addr %ld : (!fir.box>>) -> !fir.heap> +fir.freemem %base : !fir.heap> +%undef = fir.zero_bits !fir.heap> +%sh0 = fir.shape %c0 : (index) -> !fir.shape<1> +%empty = fir.embox %undef(%sh0) : (!fir.heap>, !fir.shape<1>) -> !fir.box>> +fir.store %empty to %decl#0 : !fir.ref>>> +return + } +} + +// CHECK-LABEL: func.func @automap() +// CHECK: fir.allocmem +// CHECK: fir.store +// CHECK: omp.map.info {{.*}}map_clauses(to) +// CHECK: omp.target_enter_data +// CHECK: omp.map.info {{.*}}map_clauses(delete) +// CHECK: omp.target_exit_data +// CHECK: fir.freemem TIFitis wrote: I've updated the test to make sure it's mapping the automap global. The test checks that the `target_enter_data` succeeds the `allocmem` operation and the `target_exit_data` precedes the `freemem` operation which should imply any other use of the global in between would remain intact. Let me know if you're happy with the updated test. https://github.com/llvm/llvm-project/pull/151989 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][ISel] Select constructive EXT_ZZZI pseudo instruction (PR #152554)
@@ -86,6 +83,13 @@ bool AArch64PostCoalescer::runOnMachineFunction(MachineFunction &MF) { Changed = true; break; } + case AArch64::EXT_ZZZI: +Register DstReg = MI.getOperand(0).getReg(); +Register SrcReg1 = MI.getOperand(1).getReg(); +if (SrcReg1 != DstReg) { + MRI->setRegAllocationHint(DstReg, 0, SrcReg1); +} +break; gbossu wrote: Note that this commit is really just a WIP to show we can slightly improve codegen with some hints. I'm not sure it should remain in that PR. https://github.com/llvm/llvm-project/pull/152554 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [HLSL] Add implicit binding attribute to resource arrays (PR #152452)
@@ -3640,6 +3655,24 @@ void SemaHLSL::ActOnVariableDeclarator(VarDecl *VD) { // process explicit bindings processExplicitBindingsOnDecl(VD); + +if (VD->getType()->isHLSLResourceRecordArray()) { + // If the resource array does not have an explicit binding attribute, + // create an implicit one. It will be used to transfer implicit binding + // order_ID to codegen. + if (!VD->hasAttr()) { bob80905 wrote: Shouldn't this check if it's missing HLSLResourceBindingAttr? Or is this saying that HLSLVkBindingAttr is only added when a binding attribute is explicitly spelled out? https://github.com/llvm/llvm-project/pull/152452 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [HLSL] Add implicit binding attribute to resource arrays (PR #152452)
https://github.com/hekota updated https://github.com/llvm/llvm-project/pull/152452 >From 4e153a4da8b990a1d07d6d1d63d2be74ed45e2eb Mon Sep 17 00:00:00 2001 From: Helena Kotas Date: Thu, 7 Aug 2025 00:37:23 -0700 Subject: [PATCH 1/2] [HLSL] Add implicit binding attribute to resource arrays without binding and make them static If a resource array does not have an explicit binding attribute, SemaHLSL will add an implicit one. The attribute will be used to transfer implicit binding order ID to the codegen, the same way as it is done for HLSLBufferDecls. This is necessary in order to generate correct initialization of resources in an array that does not have an explicit binding. This change also marks resource arrays declared at a global scope as `static`, which is what is already done for standalone resources. --- clang/lib/Sema/SemaHLSL.cpp | 57 +++ .../test/AST/HLSL/resource_binding_attr.hlsl | 28 +++-- 2 files changed, 69 insertions(+), 16 deletions(-) diff --git a/clang/lib/Sema/SemaHLSL.cpp b/clang/lib/Sema/SemaHLSL.cpp index 873efdae38f18..ffb996e79409c 100644 --- a/clang/lib/Sema/SemaHLSL.cpp +++ b/clang/lib/Sema/SemaHLSL.cpp @@ -71,6 +71,10 @@ static RegisterType getRegisterType(ResourceClass RC) { llvm_unreachable("unexpected ResourceClass value"); } +static RegisterType getRegisterType(const HLSLAttributedResourceType *ResTy) { + return getRegisterType(ResTy->getAttrs().ResourceClass); +} + // Converts the first letter of string Slot to RegisterType. // Returns false if the letter does not correspond to a valid register type. static bool convertToRegisterType(StringRef Slot, RegisterType *RT) { @@ -342,6 +346,17 @@ static bool isResourceRecordTypeOrArrayOf(VarDecl *VD) { return Ty->isHLSLResourceRecord() || Ty->isHLSLResourceRecordArray(); } +static const HLSLAttributedResourceType * +getResourceArrayHandleType(VarDecl *VD) { + assert(VD->getType()->isHLSLResourceRecordArray() && + "expected array of resource records"); + const Type *Ty = VD->getType()->getUnqualifiedDesugaredType(); + while (const ConstantArrayType *CAT = dyn_cast(Ty)) { +Ty = CAT->getArrayElementTypeNoTypeQual()->getUnqualifiedDesugaredType(); + } + return HLSLAttributedResourceType::findHandleTypeOnResource(Ty); +} + // Returns true if the type is a leaf element type that is not valid to be // included in HLSL Buffer, such as a resource class, empty struct, zero-sized // array, or a builtin intangible type. Returns false it is a valid leaf element @@ -568,16 +583,13 @@ void createHostLayoutStructForBuffer(Sema &S, HLSLBufferDecl *BufDecl) { BufDecl->addLayoutStruct(LS); } -static void addImplicitBindingAttrToBuffer(Sema &S, HLSLBufferDecl *BufDecl, - uint32_t ImplicitBindingOrderID) { - RegisterType RT = - BufDecl->isCBuffer() ? RegisterType::CBuffer : RegisterType::SRV; +static void addImplicitBindingAttrToDecl(Sema &S, Decl *D, RegisterType RT, + uint32_t ImplicitBindingOrderID) { auto *Attr = HLSLResourceBindingAttr::CreateImplicit(S.getASTContext(), "", "0", {}); - std::optional RegSlot; - Attr->setBinding(RT, RegSlot, 0); + Attr->setBinding(RT, std::nullopt, 0); Attr->setImplicitBindingOrderID(ImplicitBindingOrderID); - BufDecl->addAttr(Attr); + D->addAttr(Attr); } // Handle end of cbuffer/tbuffer declaration @@ -600,7 +612,10 @@ void SemaHLSL::ActOnFinishBuffer(Decl *Dcl, SourceLocation RBrace) { if (RBA) RBA->setImplicitBindingOrderID(OrderID); else - addImplicitBindingAttrToBuffer(SemaRef, BufDecl, OrderID); + addImplicitBindingAttrToDecl(SemaRef, BufDecl, + BufDecl->isCBuffer() ? RegisterType::CBuffer +: RegisterType::SRV, + OrderID); } SemaRef.PopDeclContext(); @@ -1906,7 +1921,7 @@ static bool DiagnoseLocalRegisterBinding(Sema &S, SourceLocation &ArgLoc, if (const HLSLAttributedResourceType *AttrResType = HLSLAttributedResourceType::findHandleTypeOnResource( VD->getType().getTypePtr())) { -if (RegType == getRegisterType(AttrResType->getAttrs().ResourceClass)) +if (RegType == getRegisterType(AttrResType)) return true; S.Diag(D->getLocation(), diag::err_hlsl_binding_type_mismatch) @@ -2439,8 +2454,8 @@ void SemaHLSL::ActOnEndOfTranslationUnit(TranslationUnitDecl *TU) { HLSLBufferDecl *DefaultCBuffer = HLSLBufferDecl::CreateDefaultCBuffer( SemaRef.getASTContext(), SemaRef.getCurLexicalContext(), DefaultCBufferDecls); -addImplicitBindingAttrToBuffer(SemaRef, DefaultCBuffer, - getNextImplicitBindingOrderID()); +addImplicitBindingAttrToDecl(SemaRef, DefaultCBuffer, RegisterType::CBuffer, + getNextImplicitBindingOrder
[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)
@@ -974,6 +974,11 @@ AArch64TTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA, } break; } + case Intrinsic::loop_dependence_raw_mask: + case Intrinsic::loop_dependence_war_mask: +if (ST->hasSVE2()) + return 1; +return InstructionCost::getInvalid(CostKind); SamTebbs33 wrote: The intrinsics do expand into a [lot of instructions](https://github.com/llvm/llvm-project/pull/117007/files#diff-d7065626b3d269e24241429ce037d51fc91d5ead5896d67fcc038aefcfd2R1806), so I'm keen to hear people's opinions on whether invalid is better than calculating the cost of them, since that will probably be very high. https://github.com/llvm/llvm-project/pull/100579 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [HLSL] Global resource arrays element access (PR #152454)
https://github.com/hekota created https://github.com/llvm/llvm-project/pull/152454 Adds support for accessing individual resources from fixed-size resource arrays declared at global scope. When a global resource array is indexed to retrieve a specific resource, the codegen translates the `ArraySubscriptExpr` AST node to a constructor call for the corresponding resource record type and binding. Closes #145424 >From 86902233a96b26b710bd39c096cb581f252e09a4 Mon Sep 17 00:00:00 2001 From: Helena Kotas Date: Thu, 7 Aug 2025 01:30:36 -0700 Subject: [PATCH] [HLSL] Global resource arrays element access Adds support for accessing individual resources from fixed-size resource arrays declared at global scope. When a global resource array is indexed to retrieve a specific resource, the codegen translates the `ArraySubscriptExpr` into a constructor call for the corresponding resource record type and binding. Closes #145424 --- clang/include/clang/Sema/SemaHLSL.h | 9 +- clang/lib/CodeGen/CGExpr.cpp | 10 + clang/lib/CodeGen/CGHLSLRuntime.cpp | 223 +- clang/lib/CodeGen/CGHLSLRuntime.h | 6 + clang/lib/CodeGen/CodeGenModule.cpp | 4 +- clang/lib/Sema/SemaHLSL.cpp | 93 ++-- .../resources/res-array-global-multi-dim.hlsl | 32 +++ .../resources/res-array-global.hlsl | 59 + clang/test/CodeGenHLSL/static-local-ctor.hlsl | 5 +- 9 files changed, 401 insertions(+), 40 deletions(-) create mode 100644 clang/test/CodeGenHLSL/resources/res-array-global-multi-dim.hlsl create mode 100644 clang/test/CodeGenHLSL/resources/res-array-global.hlsl diff --git a/clang/include/clang/Sema/SemaHLSL.h b/clang/include/clang/Sema/SemaHLSL.h index 085c9ed9f3ebd..0c215c6e10013 100644 --- a/clang/include/clang/Sema/SemaHLSL.h +++ b/clang/include/clang/Sema/SemaHLSL.h @@ -229,10 +229,17 @@ class SemaHLSL : public SemaBase { void diagnoseAvailabilityViolations(TranslationUnitDecl *TU); - bool initGlobalResourceDecl(VarDecl *VD); uint32_t getNextImplicitBindingOrderID() { return ImplicitBindingNextOrderID++; } + + bool initGlobalResourceDecl(VarDecl *VD); + bool initGlobalResourceArrayDecl(VarDecl *VD); + void createResourceRecordCtorArgs(const Type *ResourceTy, StringRef VarName, +HLSLResourceBindingAttr *RBA, +HLSLVkBindingAttr *VkBinding, +uint32_t ArrayIndex, +llvm::SmallVector &Args); }; } // namespace clang diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp index ed35a055d8a7f..8c34fb501a3b8 100644 --- a/clang/lib/CodeGen/CGExpr.cpp +++ b/clang/lib/CodeGen/CGExpr.cpp @@ -16,6 +16,7 @@ #include "CGCall.h" #include "CGCleanup.h" #include "CGDebugInfo.h" +#include "CGHLSLRuntime.h" #include "CGObjCRuntime.h" #include "CGOpenMPRuntime.h" #include "CGRecordLayout.h" @@ -4532,6 +4533,15 @@ LValue CodeGenFunction::EmitArraySubscriptExpr(const ArraySubscriptExpr *E, LHS.getBaseInfo(), TBAAAccessInfo()); } + // The HLSL runtime handle the subscript expression on global resource arrays. + if (getLangOpts().HLSL && (E->getType()->isHLSLResourceRecord() || + E->getType()->isHLSLResourceRecordArray())) { +std::optional LV = +CGM.getHLSLRuntime().emitResourceArraySubscriptExpr(E, *this); +if (LV.has_value()) + return *LV; + } + // All the other cases basically behave like simple offsetting. // Handle the extvector case we ignored above. diff --git a/clang/lib/CodeGen/CGHLSLRuntime.cpp b/clang/lib/CodeGen/CGHLSLRuntime.cpp index 918cb3e38448d..a09e540367a18 100644 --- a/clang/lib/CodeGen/CGHLSLRuntime.cpp +++ b/clang/lib/CodeGen/CGHLSLRuntime.cpp @@ -84,6 +84,124 @@ void addRootSignature(llvm::dxbc::RootSignatureVersion RootSigVer, RootSignatureValMD->addOperand(MDVals); } +// If the specified expr is a simple decay from an array to pointer, +// return the array subexpression. Otherwise, return nullptr. +static const Expr *getSubExprFromArrayDecayOperand(const Expr *E) { + const auto *CE = dyn_cast(E); + if (!CE || CE->getCastKind() != CK_ArrayToPointerDecay) +return nullptr; + return CE->getSubExpr(); +} + +// Find array variable declaration from nested array subscript AST nodes +static const ValueDecl *getArrayDecl(const ArraySubscriptExpr *ASE) { + const Expr *E = nullptr; + while (ASE != nullptr) { +E = getSubExprFromArrayDecayOperand(ASE->getBase()); +if (!E) + return nullptr; +ASE = dyn_cast(E); + } + if (const DeclRefExpr *DRE = dyn_cast_or_null(E)) +return DRE->getDecl(); + return nullptr; +} + +// Get the total size of the array, or -1 if the array is unbounded. +static int getTotalArraySize(const clang::Type *Ty) { + assert(Ty->isArrayType() && "expected array type"); + if (
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in various special cases (PR #145330)
https://github.com/ritter-x2a updated https://github.com/llvm/llvm-project/pull/145330 >From ec5c4d315a4611383838d8b6d517dfb5a5de7806 Mon Sep 17 00:00:00 2001 From: Fabian Ritter Date: Tue, 17 Jun 2025 04:03:53 -0400 Subject: [PATCH 1/2] [AMDGPU][SDAG] Handle ISD::PTRADD in various special cases There are more places in SIISelLowering.cpp and AMDGPUISelDAGToDAG.cpp that check for ISD::ADD in a pointer context, but as far as I can tell those are only relevant for 32-bit pointer arithmetic (like frame indices/scratch addresses and LDS), for which we don't enable PTRADD generation yet. For SWDEV-516125. --- .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 2 +- .../CodeGen/SelectionDAG/TargetLowering.cpp | 21 +- llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp | 6 +- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 7 +- llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll | 67 ++ .../AMDGPU/ptradd-sdag-optimizations.ll | 196 ++ 6 files changed, 105 insertions(+), 194 deletions(-) diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp index 649a3107cc21c..e908c50b6caed 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp @@ -8389,7 +8389,7 @@ static bool isMemSrcFromConstant(SDValue Src, ConstantDataArraySlice &Slice) { GlobalAddressSDNode *G = nullptr; if (Src.getOpcode() == ISD::GlobalAddress) G = cast(Src); - else if (Src.getOpcode() == ISD::ADD && + else if (Src->isAnyAdd() && Src.getOperand(0).getOpcode() == ISD::GlobalAddress && Src.getOperand(1).getOpcode() == ISD::Constant) { G = cast(Src.getOperand(0)); diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp index e235d144e85ff..6010ce78cf4d9 100644 --- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp @@ -632,8 +632,14 @@ bool TargetLowering::ShrinkDemandedOp(SDValue Op, unsigned BitWidth, // operands on the new node are also disjoint. SDNodeFlags Flags(Op->getFlags().hasDisjoint() ? SDNodeFlags::Disjoint : SDNodeFlags::None); + unsigned Opcode = Op.getOpcode(); + if (Opcode == ISD::PTRADD) { +// It isn't a ptradd anymore if it doesn't operate on the entire +// pointer. +Opcode = ISD::ADD; + } SDValue X = DAG.getNode( - Op.getOpcode(), dl, SmallVT, + Opcode, dl, SmallVT, DAG.getNode(ISD::TRUNCATE, dl, SmallVT, Op.getOperand(0)), DAG.getNode(ISD::TRUNCATE, dl, SmallVT, Op.getOperand(1)), Flags); assert(DemandedSize <= SmallVTBits && "Narrowed below demanded bits?"); @@ -2861,6 +2867,11 @@ bool TargetLowering::SimplifyDemandedBits( return TLO.CombineTo(Op, And1); } [[fallthrough]]; + case ISD::PTRADD: +if (Op.getOperand(0).getValueType() != Op.getOperand(1).getValueType()) + break; +// PTRADD behaves like ADD if pointers are represented as integers. +[[fallthrough]]; case ISD::ADD: case ISD::SUB: { // Add, Sub, and Mul don't demand any bits in positions beyond that @@ -2970,10 +2981,10 @@ bool TargetLowering::SimplifyDemandedBits( if (Op.getOpcode() == ISD::MUL) { Known = KnownBits::mul(KnownOp0, KnownOp1); -} else { // Op.getOpcode() is either ISD::ADD or ISD::SUB. +} else { // Op.getOpcode() is either ISD::ADD, ISD::PTRADD, or ISD::SUB. Known = KnownBits::computeForAddSub( - Op.getOpcode() == ISD::ADD, Flags.hasNoSignedWrap(), - Flags.hasNoUnsignedWrap(), KnownOp0, KnownOp1); + Op->isAnyAdd(), Flags.hasNoSignedWrap(), Flags.hasNoUnsignedWrap(), + KnownOp0, KnownOp1); } break; } @@ -5696,7 +5707,7 @@ bool TargetLowering::isGAPlusOffset(SDNode *WN, const GlobalValue *&GA, return true; } - if (N->getOpcode() == ISD::ADD) { + if (N->isAnyAdd()) { SDValue N1 = N->getOperand(0); SDValue N2 = N->getOperand(1); if (isGAPlusOffset(N1.getNode(), GA, Offset)) { diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp index fb83388e5e265..aea1b9461da89 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp @@ -1489,7 +1489,7 @@ bool AMDGPUDAGToDAGISel::SelectMUBUF(SDValue Addr, SDValue &Ptr, SDValue &VAddr, C1 = nullptr; } - if (N0.getOpcode() == ISD::ADD) { + if (N0->isAnyAdd()) { // (add N2, N3) -> addr64, or // (add (add N2, N3), C1) -> addr64 SDValue N2 = N0.getOperand(0); @@ -1951,7 +1951,7 @@ bool AMDGPUDAGToDAGISel::SelectGlobalSAddr(SDNode *N, SDValue Addr, } // Match the variable offset. - if (Addr.getOpcode() == ISD::ADD) { + if (Addr->isAnyAdd()) { LHS = Addr.getOperand(0); if (!LHS
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Test ISD::PTRADD handling in various special cases (PR #145329)
https://github.com/ritter-x2a updated https://github.com/llvm/llvm-project/pull/145329 >From b4212e94fbf40d8b9bebdb346f7aee103f5d561e Mon Sep 17 00:00:00 2001 From: Fabian Ritter Date: Tue, 17 Jun 2025 03:51:19 -0400 Subject: [PATCH] [AMDGPU][SDAG] Test ISD::PTRADD handling in various special cases Pre-committing tests to show improvements in a follow-up PR. --- llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll | 63 ++ .../AMDGPU/ptradd-sdag-optimizations.ll | 206 ++ 2 files changed, 269 insertions(+) create mode 100644 llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll new file mode 100644 index 0..fab56383ffa8a --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll @@ -0,0 +1,63 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=tahiti -amdgpu-use-sdag-ptradd=1 < %s | FileCheck --check-prefixes=GFX6,GFX6_PTRADD %s +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=tahiti -amdgpu-use-sdag-ptradd=0 < %s | FileCheck --check-prefixes=GFX6,GFX6_LEGACY %s + +; Test PTRADD handling in AMDGPUDAGToDAGISel::SelectMUBUF. + +define amdgpu_kernel void @v_add_i32(ptr addrspace(1) %out, ptr addrspace(1) %in) { +; GFX6_PTRADD-LABEL: v_add_i32: +; GFX6_PTRADD: ; %bb.0: +; GFX6_PTRADD-NEXT:s_load_dwordx4 s[0:3], s[8:9], 0x0 +; GFX6_PTRADD-NEXT:v_lshlrev_b32_e32 v0, 2, v0 +; GFX6_PTRADD-NEXT:s_mov_b32 s7, 0x100f000 +; GFX6_PTRADD-NEXT:s_mov_b32 s10, 0 +; GFX6_PTRADD-NEXT:s_mov_b32 s11, s7 +; GFX6_PTRADD-NEXT:s_waitcnt lgkmcnt(0) +; GFX6_PTRADD-NEXT:v_mov_b32_e32 v1, s3 +; GFX6_PTRADD-NEXT:v_add_i32_e32 v0, vcc, s2, v0 +; GFX6_PTRADD-NEXT:v_addc_u32_e32 v1, vcc, 0, v1, vcc +; GFX6_PTRADD-NEXT:s_mov_b32 s8, s10 +; GFX6_PTRADD-NEXT:s_mov_b32 s9, s10 +; GFX6_PTRADD-NEXT:buffer_load_dword v2, v[0:1], s[8:11], 0 addr64 glc +; GFX6_PTRADD-NEXT:s_waitcnt vmcnt(0) +; GFX6_PTRADD-NEXT:buffer_load_dword v0, v[0:1], s[8:11], 0 addr64 offset:4 glc +; GFX6_PTRADD-NEXT:s_waitcnt vmcnt(0) +; GFX6_PTRADD-NEXT:s_mov_b32 s6, -1 +; GFX6_PTRADD-NEXT:s_mov_b32 s4, s0 +; GFX6_PTRADD-NEXT:s_mov_b32 s5, s1 +; GFX6_PTRADD-NEXT:v_add_i32_e32 v0, vcc, v2, v0 +; GFX6_PTRADD-NEXT:buffer_store_dword v0, off, s[4:7], 0 +; GFX6_PTRADD-NEXT:s_endpgm +; +; GFX6_LEGACY-LABEL: v_add_i32: +; GFX6_LEGACY: ; %bb.0: +; GFX6_LEGACY-NEXT:s_load_dwordx4 s[0:3], s[8:9], 0x0 +; GFX6_LEGACY-NEXT:s_mov_b32 s7, 0x100f000 +; GFX6_LEGACY-NEXT:s_mov_b32 s10, 0 +; GFX6_LEGACY-NEXT:s_mov_b32 s11, s7 +; GFX6_LEGACY-NEXT:v_lshlrev_b32_e32 v0, 2, v0 +; GFX6_LEGACY-NEXT:s_waitcnt lgkmcnt(0) +; GFX6_LEGACY-NEXT:s_mov_b64 s[8:9], s[2:3] +; GFX6_LEGACY-NEXT:v_mov_b32_e32 v1, 0 +; GFX6_LEGACY-NEXT:buffer_load_dword v2, v[0:1], s[8:11], 0 addr64 glc +; GFX6_LEGACY-NEXT:s_waitcnt vmcnt(0) +; GFX6_LEGACY-NEXT:buffer_load_dword v0, v[0:1], s[8:11], 0 addr64 offset:4 glc +; GFX6_LEGACY-NEXT:s_waitcnt vmcnt(0) +; GFX6_LEGACY-NEXT:s_mov_b32 s6, -1 +; GFX6_LEGACY-NEXT:s_mov_b32 s4, s0 +; GFX6_LEGACY-NEXT:s_mov_b32 s5, s1 +; GFX6_LEGACY-NEXT:v_add_i32_e32 v0, vcc, v2, v0 +; GFX6_LEGACY-NEXT:buffer_store_dword v0, off, s[4:7], 0 +; GFX6_LEGACY-NEXT:s_endpgm + %tid = call i32 @llvm.amdgcn.workitem.id.x() + %gep = getelementptr inbounds i32, ptr addrspace(1) %in, i32 %tid + %b_ptr = getelementptr i32, ptr addrspace(1) %gep, i32 1 + %a = load volatile i32, ptr addrspace(1) %gep + %b = load volatile i32, ptr addrspace(1) %b_ptr + %result = add i32 %a, %b + store i32 %result, ptr addrspace(1) %out + ret void +} + +;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line: +; GFX6: {{.*}} diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll index b7bfc5a7c..1a54ba716a80a 100644 --- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll +++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll @@ -291,3 +291,209 @@ define ptr @fold_mul24_into_mad(ptr %base, i64 %a, i64 %b) { %gep = getelementptr inbounds i8, ptr %base, i64 %mul ret ptr %gep } + +; Test PTRADD handling in AMDGPUDAGToDAGISel::SelectGlobalSAddr. +define amdgpu_kernel void @uniform_base_varying_offset_imm(ptr addrspace(1) %p) { +; GFX942_PTRADD-LABEL: uniform_base_varying_offset_imm: +; GFX942_PTRADD: ; %bb.0: ; %entry +; GFX942_PTRADD-NEXT:s_load_dwordx2 s[0:1], s[4:5], 0x0 +; GFX942_PTRADD-NEXT:v_and_b32_e32 v0, 0x3ff, v0 +; GFX942_PTRADD-NEXT:v_mov_b32_e32 v1, 0 +; GFX942_PTRADD-NEXT:v_lshlrev_b32_e32 v0, 2, v0 +; GFX942_PTRADD-NEXT:v_mov_b32_e32 v2, 1 +; GFX942_PTRADD-NEXT:s_waitcnt lgkmcnt(0) +; GFX942_PTRADD-NEXT:v_lshl_add_u64 v[0:1], s[0:1], 0, v[0:1] +; GFX942_PTRAD
[llvm-branch-commits] [llvm] release/21.x: [flang-rt] Use correct flang-rt build for flang-rt unit tests on Windows (#152318) (PR #152493)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/152493 Backport f73a302 Requested by: @DavidTruby >From 332baaaee9815118a44982c1efd1dc14dc16ae6c Mon Sep 17 00:00:00 2001 From: David Truby Date: Thu, 7 Aug 2025 13:09:35 +0100 Subject: [PATCH] [flang-rt] Use correct flang-rt build for flang-rt unit tests on Windows (#152318) Currrently flang-rt assumes that LLVM was always built with the dynamic MSVC runtime. This may not be the case, if the user has specified a different runtime with -DCMAKE_MSVC_RUNTIME_LIBRARY. Since this flag is implied by -DLLVM_ENABLE_RPMALLOC=On, which is used by the Windows release script, this is causing that script to fail. Fixes #151920 (cherry picked from commit f73a3028c2d46928280d69d9e953ff79d2eb0fbb) --- flang-rt/lib/runtime/CMakeLists.txt | 32 + flang-rt/unittests/CMakeLists.txt | 8 2 files changed, 23 insertions(+), 17 deletions(-) diff --git a/flang-rt/lib/runtime/CMakeLists.txt b/flang-rt/lib/runtime/CMakeLists.txt index 332c0872e065f..dc2db1d9902cb 100644 --- a/flang-rt/lib/runtime/CMakeLists.txt +++ b/flang-rt/lib/runtime/CMakeLists.txt @@ -251,19 +251,33 @@ else() add_win_flangrt_runtime(STATIC dynamic MultiThreadedDLL INSTALL_WITH_TOOLCHAIN) add_win_flangrt_runtime(STATIC dynamic_dbg MultiThreadedDebugDLL INSTALL_WITH_TOOLCHAIN) - # Unittests link against LLVMSupport which is using CMake's default runtime - # library selection, which is either MultiThreadedDLL or MultiThreadedDebugDLL - # depending on the configuration. They have to match or linking will fail. + # Unittests link against LLVMSupport. If CMAKE_MSVC_RUNTIME_LIBRARY is set, + # that will have been used for LLVMSupport so it must also be used here. + # Otherwise this will use CMake's default runtime library selection, which + # is either MultiThreadedDLL or MultiThreadedDebugDLL depending on the configuration. + # They have to match or linking will fail. if (GENERATOR_IS_MULTI_CONFIG) # We cannot select an ALIAS library because it may be different # per configuration. Fallback to CMake's default. add_win_flangrt_runtime(STATIC unittest "" EXCLUDE_FROM_ALL) else () -string(TOLOWER ${CMAKE_BUILD_TYPE} build_type) -if (build_type STREQUAL "debug") - add_library(flang_rt.runtime.unittest ALIAS flang_rt.runtime.dynamic_dbg) -else () - add_library(flang_rt.runtime.unittest ALIAS flang_rt.runtime.dynamic) -endif () +# Check if CMAKE_MSVC_RUNTIME_LIBRARY was set. +if (CMAKE_MSVC_RUNTIME_LIBRARY STREQUAL "MultiThreaded") +add_library(flang_rt.runtime.unittest ALIAS flang_rt.runtime.static) +elseif (CMAKE_MSVC_RUNTIME_LIBRARY STREQUAL "MultiThreadedDLL") +add_library(flang_rt.runtime.unittest ALIAS flang_rt.runtime.dynamic) +elseif (CMAKE_MSVC_RUNTIME_LIBRARY STREQUAL "MultiThreadedDebug") +add_library(flang_rt.runtime.unittest ALIAS flang_rt.runtime.static_dbg) +elseif (CMAKE_MSVC_RUNTIME_LIBRARY STREQUAL "MultiThreadedDebugDLL") +add_library(flang_rt.runtime.unittest ALIAS flang_rt.runtime.dynamic_dbg) +else() + # Default based on the build type. + string(TOLOWER ${CMAKE_BUILD_TYPE} build_type) + if (build_type STREQUAL "debug") + add_library(flang_rt.runtime.unittest ALIAS flang_rt.runtime.dynamic_dbg) + else () + add_library(flang_rt.runtime.unittest ALIAS flang_rt.runtime.dynamic) + endif () +endif() endif () endif() diff --git a/flang-rt/unittests/CMakeLists.txt b/flang-rt/unittests/CMakeLists.txt index 831bc8a4c2906..fd63ad11dcf43 100644 --- a/flang-rt/unittests/CMakeLists.txt +++ b/flang-rt/unittests/CMakeLists.txt @@ -94,14 +94,6 @@ function(add_flangrt_unittest test_dirname) target_link_libraries(${test_dirname} PRIVATE ${ARG_LINK_LIBS}) add_flangrt_unittest_offload_properties(${test_dirname}) add_flangrt_dependent_libs(${test_dirname}) - - # Required because LLVMSupport is compiled with this option. - # FIXME: According to CMake documentation, this is the default. Why is it - #needed? LLVM's add_unittest doesn't set it either. - set_target_properties(${test_dirname} - PROPERTIES -MSVC_RUNTIME_LIBRARY "MultiThreaded$<$:Debug>DLL" -) endfunction() function(add_flangrt_nongtest_unittest test_name) ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/21.x: [flang-rt] Use correct flang-rt build for flang-rt unit tests on Windows (#152318) (PR #152493)
llvmbot wrote: @Meinersbur What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/152493 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/21.x: [flang-rt] Use correct flang-rt build for flang-rt unit tests on Windows (#152318) (PR #152493)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/152493 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add a few missing mfma rewrite tests (PR #149026)
https://github.com/arsenm closed https://github.com/llvm/llvm-project/pull/149026 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [HLSL] Add implicit binding attribute to resource arrays (PR #152452)
https://github.com/hekota edited https://github.com/llvm/llvm-project/pull/152452 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [HLSL] Add implicit binding attribute to resource arrays (PR #152452)
https://github.com/hekota edited https://github.com/llvm/llvm-project/pull/152452 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [llvm] [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (PR #151989)
@@ -0,0 +1,171 @@ +//===- AutomapToTargetData.cpp ---===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// + +#include "flang/Optimizer/Builder/DirectivesCommon.h" +#include "flang/Optimizer/Builder/FIRBuilder.h" +#include "flang/Optimizer/Builder/HLFIRTools.h" +#include "flang/Optimizer/Dialect/FIROps.h" +#include "flang/Optimizer/Dialect/FIRType.h" +#include "flang/Optimizer/Dialect/Support/KindMapping.h" +#include "flang/Optimizer/HLFIR/HLFIROps.h" +#include "mlir/IR/BuiltinAttributes.h" +#include "mlir/Pass/Pass.h" +#include "llvm/Frontend/OpenMP/OMPConstants.h" +#include +#include + +namespace flangomp { +#define GEN_PASS_DEF_AUTOMAPTOTARGETDATAPASS +#include "flang/Optimizer/OpenMP/Passes.h.inc" +} // namespace flangomp + +using namespace mlir; + +namespace { +class AutomapToTargetDataPass +: public flangomp::impl::AutomapToTargetDataPassBase< + AutomapToTargetDataPass> { + // Returns true if the variable has a dynamic size and therefore requires + // bounds operations to describe its extents. + bool needsBoundsOps(Value var) { TIFitis wrote: I've moved both as static functions to _flang/include/flang/Support/OpenMP-utils.h_. Let me know if that's alright. https://github.com/llvm/llvm-project/pull/151989 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Adjust hard clause rules for gfx1250 (PR #152592)
rampitec wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/152592?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#152592** https://app.graphite.dev/github/pr/llvm/llvm-project/152592?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/152592?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#152584** https://app.graphite.dev/github/pr/llvm/llvm-project/152584?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/152592 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Adjust hard clause rules for gfx1250 (PR #152592)
https://github.com/rampitec ready_for_review https://github.com/llvm/llvm-project/pull/152592 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [HLSL] Add implicit binding attribute to resource arrays (PR #152452)
@@ -342,6 +346,17 @@ static bool isResourceRecordTypeOrArrayOf(VarDecl *VD) { return Ty->isHLSLResourceRecord() || Ty->isHLSLResourceRecordArray(); } +static const HLSLAttributedResourceType * +getResourceArrayHandleType(VarDecl *VD) { + assert(VD->getType()->isHLSLResourceRecordArray() && + "expected array of resource records"); + const Type *Ty = VD->getType()->getUnqualifiedDesugaredType(); + while (const ConstantArrayType *CAT = dyn_cast(Ty)) { +Ty = CAT->getArrayElementTypeNoTypeQual()->getUnqualifiedDesugaredType(); + } hekota wrote: It is grabbing the array element type (=the actual resource type). Multi-dimensional arrays are represented by nested `ConstantArrayType`s instances, so to get to the element type the array type needs to be "unwrapped" in a loop. https://github.com/llvm/llvm-project/pull/152452 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/21.x: [clang][PAC] Fix PAC codegen for final class dynamic_cast optimization (#152227) (PR #152586)
https://github.com/asl approved this pull request. https://github.com/llvm/llvm-project/pull/152586 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [ir] MD_prof is not UB-implying (PR #152420)
https://github.com/mtrofin updated https://github.com/llvm/llvm-project/pull/152420 >From f0cf2e9a7ad9b45a6270c727b60e4cd15ea57d27 Mon Sep 17 00:00:00 2001 From: Mircea Trofin Date: Wed, 6 Aug 2025 17:43:35 -0700 Subject: [PATCH] [ir] MD_prof is not UB-implying --- llvm/lib/IR/Metadata.cpp | 4 ++ .../Transforms/LICM/hoist-phi-metadata.ll | 46 +++ 2 files changed, 50 insertions(+) diff --git a/llvm/lib/IR/Metadata.cpp b/llvm/lib/IR/Metadata.cpp index 1157cbe6bbc1b..ba838cd2793ce 100644 --- a/llvm/lib/IR/Metadata.cpp +++ b/llvm/lib/IR/Metadata.cpp @@ -57,6 +57,8 @@ using namespace llvm; +extern cl::opt ProfcheckDisableMetadataFixes; + MetadataAsValue::MetadataAsValue(Type *Ty, Metadata *MD) : Value(Ty, MetadataAsValueVal), MD(MD) { track(); @@ -1678,6 +1680,8 @@ void Instruction::dropUnknownNonDebugMetadata(ArrayRef KnownIDs) { // A DIAssignID attachment is debug metadata, don't drop it. KnownSet.insert(LLVMContext::MD_DIAssignID); + if (!ProfcheckDisableMetadataFixes) +KnownSet.insert(LLVMContext::MD_prof); Value::eraseMetadataIf([&KnownSet](unsigned MDKind, MDNode *Node) { return !KnownSet.count(MDKind); diff --git a/llvm/test/Transforms/LICM/hoist-phi-metadata.ll b/llvm/test/Transforms/LICM/hoist-phi-metadata.ll index e98de9c79ea8c..6034d12d931c2 100644 --- a/llvm/test/Transforms/LICM/hoist-phi-metadata.ll +++ b/llvm/test/Transforms/LICM/hoist-phi-metadata.ll @@ -45,6 +45,46 @@ end: ret void } +declare i32 @getv() + +; indirect.goto.dest2 should get hoisted, and that should not result +; in a loss of profiling info +define i32 @test19(i1 %cond, i1 %cond2, ptr %address, i32 %v1) nounwind { +; CHECK-LABEL: define i32 @test19 +; CHECK-SAME: (i1 [[COND:%.*]], i1 [[COND2:%.*]], ptr [[ADDRESS:%.*]], i32 [[V1:%.*]]) #[[ATTR0:[0-9]+]] { +; CHECK-NEXT: entry: +; CHECK-NEXT:[[INDIRECT_GOTO_DEST:%.*]] = select i1 [[COND]], ptr blockaddress(@test19, [[EXIT:%.*]]), ptr [[ADDRESS]], !prof [[PROF9:![0-9]+]] +; CHECK-NEXT:[[INDIRECT_GOTO_DEST2:%.*]] = select i1 [[COND2]], ptr blockaddress(@test19, [[EXIT]]), ptr [[ADDRESS]], !prof [[PROF10:![0-9]+]] +; CHECK-NEXT:br label [[L0:%.*]] +; CHECK: L0: +; CHECK-NEXT:[[V2:%.*]] = call i32 @getv() +; CHECK-NEXT:[[SINKABLE:%.*]] = mul i32 [[V1]], [[V2]] +; CHECK-NEXT:[[SINKABLE2:%.*]] = add i32 [[V1]], [[V2]] +; CHECK-NEXT:indirectbr ptr [[INDIRECT_GOTO_DEST]], [label [[L1:%.*]], label %exit] +; CHECK: L1: +; CHECK-NEXT:indirectbr ptr [[INDIRECT_GOTO_DEST2]], [label [[L0]], label %exit] +; CHECK: exit: +; CHECK-NEXT:[[R:%.*]] = phi i32 [ [[SINKABLE]], [[L0]] ], [ [[SINKABLE2]], [[L1]] ] +; CHECK-NEXT:ret i32 [[R]] +; +entry: + br label %L0 +L0: + %indirect.goto.dest = select i1 %cond, ptr blockaddress(@test19, %exit), ptr %address, !prof !10 + %v2 = call i32 @getv() + %sinkable = mul i32 %v1, %v2 + %sinkable2 = add i32 %v1, %v2 + indirectbr ptr %indirect.goto.dest, [label %L1, label %exit] + +L1: + %indirect.goto.dest2 = select i1 %cond2, ptr blockaddress(@test19, %exit), ptr %address, !prof !11 + indirectbr ptr %indirect.goto.dest2, [label %L0, label %exit] + +exit: + %r = phi i32 [%sinkable, %L0], [%sinkable2, %L1] + ret i32 %r +} + !llvm.module.flags = !{!2, !3} !0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !1) @@ -57,6 +97,10 @@ end: !7 = !DILocation(line: 3, column: 22, scope: !4) !8 = !{!"branch_weights", i32 5, i32 7} !9 = !{!"branch_weights", i32 13, i32 11} +!10 = !{!"branch_weights", i32 101, i32 189} +!11 = !{!"branch_weights", i32 67, i32 1} +;. +; CHECK: attributes #[[ATTR0]] = { nounwind } ;. ; CHECK: [[META0:![0-9]+]] = !{i32 7, !"Dwarf Version", i32 5} ; CHECK: [[META1:![0-9]+]] = !{i32 2, !"Debug Info Version", i32 3} @@ -67,4 +111,6 @@ end: ; CHECK: [[PROF6]] = !{!"branch_weights", i32 5, i32 7} ; CHECK: [[DBG7]] = !DILocation(line: 3, column: 22, scope: [[META3]]) ; CHECK: [[PROF8]] = !{!"branch_weights", i32 13, i32 11} +; CHECK: [[PROF9]] = !{!"branch_weights", i32 101, i32 189} +; CHECK: [[PROF10]] = !{!"branch_weights", i32 67, i32 1} ;. ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [HLSL] Add implicit binding attribute to resource arrays (PR #152452)
@@ -34,6 +34,10 @@ RWBuffer UAV1 : register(u2), UAV2 : register(u4); // CHECK: HLSLResourceBindingAttr {{.*}} "" "space5" RWBuffer UAV3 : register(space5); +// CHECK: VarDecl {{.*}} UAV_Array 'RWBuffer[10]' hekota wrote: Will do. https://github.com/llvm/llvm-project/pull/152452 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Enable CodeGen for v_pk_fma_bf16 (PR #152578)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: Stanislav Mekhanoshin (rampitec) Changes --- Patch is 80.64 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/152578.diff 3 Files Affected: - (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+1) - (modified) llvm/test/CodeGen/AMDGPU/bf16-math.ll (+29-14) - (modified) llvm/test/CodeGen/AMDGPU/bf16.ll (+362-777) ``diff diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index 8f44c03d95b43..fd1be72ce6d82 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -6106,6 +6106,7 @@ bool SITargetLowering::isFMAFasterThanFMulAndFAdd(const MachineFunction &MF, case MVT::f64: return true; case MVT::f16: + case MVT::bf16: return Subtarget->has16BitInsts() && !denormalModeIsFlushAllF64F16(MF); default: break; diff --git a/llvm/test/CodeGen/AMDGPU/bf16-math.ll b/llvm/test/CodeGen/AMDGPU/bf16-math.ll index 682b3b4d57209..3a82f848f06a5 100644 --- a/llvm/test/CodeGen/AMDGPU/bf16-math.ll +++ b/llvm/test/CodeGen/AMDGPU/bf16-math.ll @@ -370,6 +370,9 @@ define amdgpu_ps bfloat @test_clamp_bf16_folding(bfloat %src) { ; GCN: ; %bb.0: ; GCN-NEXT:v_exp_bf16_e64 v0, v0 clamp ; GCN-NEXT:; return to shader part epilog + + + %exp = call bfloat @llvm.exp2.bf16(bfloat %src) %max = call bfloat @llvm.maxnum.bf16(bfloat %exp, bfloat 0.0) %clamp = call bfloat @llvm.minnum.bf16(bfloat %max, bfloat 1.0) @@ -381,6 +384,9 @@ define amdgpu_ps float @test_clamp_v2bf16_folding(<2 x bfloat> %src0, <2 x bfloa ; GCN: ; %bb.0: ; GCN-NEXT:v_pk_mul_bf16 v0, v0, v1 clamp ; GCN-NEXT:; return to shader part epilog + + + %mul = fmul <2 x bfloat> %src0, %src1 %max = call <2 x bfloat> @llvm.maxnum.v2bf16(<2 x bfloat> %mul, <2 x bfloat> ) %clamp = call <2 x bfloat> @llvm.minnum.v2bf16(<2 x bfloat> %max, <2 x bfloat> ) @@ -391,11 +397,12 @@ define amdgpu_ps float @test_clamp_v2bf16_folding(<2 x bfloat> %src0, <2 x bfloa define amdgpu_ps void @v_test_mul_add_v2bf16_vvv(ptr addrspace(1) %out, <2 x bfloat> %a, <2 x bfloat> %b, <2 x bfloat> %c) { ; GCN-LABEL: v_test_mul_add_v2bf16_vvv: ; GCN: ; %bb.0: -; GCN-NEXT:v_pk_mul_bf16 v2, v2, v3 -; GCN-NEXT:s_delay_alu instid0(VALU_DEP_1) -; GCN-NEXT:v_pk_add_bf16 v2, v2, v4 +; GCN-NEXT:v_pk_fma_bf16 v2, v2, v3, v4 ; GCN-NEXT:global_store_b32 v[0:1], v2, off ; GCN-NEXT:s_endpgm + + + %mul = fmul contract <2 x bfloat> %a, %b %add = fadd contract <2 x bfloat> %mul, %c store <2 x bfloat> %add, ptr addrspace(1) %out @@ -405,11 +412,12 @@ define amdgpu_ps void @v_test_mul_add_v2bf16_vvv(ptr addrspace(1) %out, <2 x bfl define amdgpu_ps void @v_test_mul_add_v2bf16_vss(ptr addrspace(1) %out, <2 x bfloat> %a, <2 x bfloat> inreg %b, <2 x bfloat> inreg %c) { ; GCN-LABEL: v_test_mul_add_v2bf16_vss: ; GCN: ; %bb.0: -; GCN-NEXT:v_pk_mul_bf16 v2, v2, s0 -; GCN-NEXT:s_delay_alu instid0(VALU_DEP_1) -; GCN-NEXT:v_pk_add_bf16 v2, v2, s1 +; GCN-NEXT:v_pk_fma_bf16 v2, v2, s0, s1 ; GCN-NEXT:global_store_b32 v[0:1], v2, off ; GCN-NEXT:s_endpgm + + + %mul = fmul contract <2 x bfloat> %a, %b %add = fadd contract <2 x bfloat> %mul, %c store <2 x bfloat> %add, ptr addrspace(1) %out @@ -419,11 +427,14 @@ define amdgpu_ps void @v_test_mul_add_v2bf16_vss(ptr addrspace(1) %out, <2 x bfl define amdgpu_ps void @v_test_mul_add_v2bf16_sss(ptr addrspace(1) %out, <2 x bfloat> inreg %a, <2 x bfloat> inreg %b, <2 x bfloat> inreg %c) { ; GCN-LABEL: v_test_mul_add_v2bf16_sss: ; GCN: ; %bb.0: -; GCN-NEXT:v_pk_mul_bf16 v2, s0, s1 +; GCN-NEXT:v_mov_b32_e32 v2, s2 ; GCN-NEXT:s_delay_alu instid0(VALU_DEP_1) -; GCN-NEXT:v_pk_add_bf16 v2, v2, s2 +; GCN-NEXT:v_pk_fma_bf16 v2, s0, s1, v2 ; GCN-NEXT:global_store_b32 v[0:1], v2, off ; GCN-NEXT:s_endpgm + + + %mul = fmul contract <2 x bfloat> %a, %b %add = fadd contract <2 x bfloat> %mul, %c store <2 x bfloat> %add, ptr addrspace(1) %out @@ -433,11 +444,12 @@ define amdgpu_ps void @v_test_mul_add_v2bf16_sss(ptr addrspace(1) %out, <2 x bfl define amdgpu_ps void @v_test_mul_add_v2bf16_vsc(ptr addrspace(1) %out, <2 x bfloat> %a, <2 x bfloat> inreg %b) { ; GCN-LABEL: v_test_mul_add_v2bf16_vsc: ; GCN: ; %bb.0: -; GCN-NEXT:v_pk_mul_bf16 v2, v2, s0 -; GCN-NEXT:s_delay_alu instid0(VALU_DEP_1) -; GCN-NEXT:v_pk_add_bf16 v2, v2, 0.5 op_sel_hi:[1,0] +; GCN-NEXT:v_pk_fma_bf16 v2, v2, s0, 0.5 op_sel_hi:[1,1,0] ; GCN-NEXT:global_store_b32 v[0:1], v2, off ; GCN-NEXT:s_endpgm + + + %mul = fmul contract <2 x bfloat> %a, %b %add = fadd contract <2 x bfloat> %mul, store <2 x bfloat> %add, ptr addrspace(1) %out @@ -447,11 +459,14 @@ define amdgpu_ps void @v_test_mul_add_v2bf16_vsc(ptr addrspace(1) %out, <2 x bfl define amdgpu_ps void @v_test_mul_a
[llvm-branch-commits] [clang] release/21.x: [clang][PAC] Fix PAC codegen for final class dynamic_cast optimization (#152227) (PR #152587)
llvmbot wrote: @asl What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/152587 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/21.x: [clang][PAC] Fix PAC codegen for final class dynamic_cast optimization (#152227) (PR #152587)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/152587 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/21.x: [clang][PAC] Fix PAC codegen for final class dynamic_cast optimization (#152227) (PR #152587)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/152587 Backport 726847829553079a13b1b7104f2c2db9dcda9c1d Requested by: @ojhunt >From 9a524d13b390693d91742c4f8b7465a7963b0edf Mon Sep 17 00:00:00 2001 From: Oliver Hunt Date: Tue, 5 Aug 2025 17:41:55 -0700 Subject: [PATCH] [clang][PAC] Fix PAC codegen for final class dynamic_cast optimization (#152227) The codegen for the final class dynamic_cast optimization fails to consider pointer authentication. This change resolves this be simply disabling the optimization when pointer authentication enabled. (cherry picked from commit 726847829553079a13b1b7104f2c2db9dcda9c1d) --- clang/lib/CodeGen/CGExprCXX.cpp | 3 ++- clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp | 1 + 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/clang/lib/CodeGen/CGExprCXX.cpp b/clang/lib/CodeGen/CGExprCXX.cpp index 359e30cb8f5cd..912b1d72c7e23 100644 --- a/clang/lib/CodeGen/CGExprCXX.cpp +++ b/clang/lib/CodeGen/CGExprCXX.cpp @@ -2313,7 +2313,8 @@ llvm::Value *CodeGenFunction::EmitDynamicCast(Address ThisAddr, bool IsExact = !IsDynamicCastToVoid && CGM.getCodeGenOpts().OptimizationLevel > 0 && DestRecordTy->getAsCXXRecordDecl()->isEffectivelyFinal() && - CGM.getCXXABI().shouldEmitExactDynamicCast(DestRecordTy); + CGM.getCXXABI().shouldEmitExactDynamicCast(DestRecordTy) && + !getLangOpts().PointerAuthCalls; // C++ [expr.dynamic.cast]p4: // If the value of v is a null pointer value in the pointer case, the result diff --git a/clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp b/clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp index 9a8ce1997a7f9..19c2a9bd0497e 100644 --- a/clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp +++ b/clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp @@ -3,6 +3,7 @@ // RUN: %clang_cc1 -I%S %s -triple x86_64-apple-darwin10 -O1 -fvisibility=hidden -emit-llvm -std=c++11 -o - | FileCheck %s --check-prefixes=CHECK,INEXACT // RUN: %clang_cc1 -I%S %s -triple x86_64-apple-darwin10 -O1 -fapple-kext -emit-llvm -std=c++11 -o - | FileCheck %s --check-prefixes=CHECK,INEXACT // RUN: %clang_cc1 -I%S %s -triple x86_64-apple-darwin10 -O1 -fno-assume-unique-vtables -emit-llvm -std=c++11 -o - | FileCheck %s --check-prefixes=CHECK,INEXACT +// RUN: %clang_cc1 -I%S %s -triple arm64e-apple-darwin10 -O1 -fptrauth-calls -emit-llvm -std=c++11 -o - | FileCheck %s --check-prefixes=CHECK,INEXACT struct A { virtual ~A(); }; struct B final : A { }; ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/21.x: [clang][PAC] Fix PAC codegen for final class dynamic_cast optimization (#152227) (PR #152586)
llvmbot wrote: @asl What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/152586 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/21.x: [clang][PAC] Fix PAC codegen for final class dynamic_cast optimization (#152227) (PR #152587)
llvmbot wrote: @llvm/pr-subscribers-clang-codegen @llvm/pr-subscribers-clang Author: None (llvmbot) Changes Backport 726847829553079a13b1b7104f2c2db9dcda9c1d Requested by: @ojhunt --- Full diff: https://github.com/llvm/llvm-project/pull/152587.diff 2 Files Affected: - (modified) clang/lib/CodeGen/CGExprCXX.cpp (+2-1) - (modified) clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp (+1) ``diff diff --git a/clang/lib/CodeGen/CGExprCXX.cpp b/clang/lib/CodeGen/CGExprCXX.cpp index 359e30cb8f5cd..912b1d72c7e23 100644 --- a/clang/lib/CodeGen/CGExprCXX.cpp +++ b/clang/lib/CodeGen/CGExprCXX.cpp @@ -2313,7 +2313,8 @@ llvm::Value *CodeGenFunction::EmitDynamicCast(Address ThisAddr, bool IsExact = !IsDynamicCastToVoid && CGM.getCodeGenOpts().OptimizationLevel > 0 && DestRecordTy->getAsCXXRecordDecl()->isEffectivelyFinal() && - CGM.getCXXABI().shouldEmitExactDynamicCast(DestRecordTy); + CGM.getCXXABI().shouldEmitExactDynamicCast(DestRecordTy) && + !getLangOpts().PointerAuthCalls; // C++ [expr.dynamic.cast]p4: // If the value of v is a null pointer value in the pointer case, the result diff --git a/clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp b/clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp index 9a8ce1997a7f9..19c2a9bd0497e 100644 --- a/clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp +++ b/clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp @@ -3,6 +3,7 @@ // RUN: %clang_cc1 -I%S %s -triple x86_64-apple-darwin10 -O1 -fvisibility=hidden -emit-llvm -std=c++11 -o - | FileCheck %s --check-prefixes=CHECK,INEXACT // RUN: %clang_cc1 -I%S %s -triple x86_64-apple-darwin10 -O1 -fapple-kext -emit-llvm -std=c++11 -o - | FileCheck %s --check-prefixes=CHECK,INEXACT // RUN: %clang_cc1 -I%S %s -triple x86_64-apple-darwin10 -O1 -fno-assume-unique-vtables -emit-llvm -std=c++11 -o - | FileCheck %s --check-prefixes=CHECK,INEXACT +// RUN: %clang_cc1 -I%S %s -triple arm64e-apple-darwin10 -O1 -fptrauth-calls -emit-llvm -std=c++11 -o - | FileCheck %s --check-prefixes=CHECK,INEXACT struct A { virtual ~A(); }; struct B final : A { }; `` https://github.com/llvm/llvm-project/pull/152587 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [HLSL] Add implicit binding attribute to resource arrays (PR #152452)
https://github.com/V-FEXrt approved this pull request. https://github.com/llvm/llvm-project/pull/152452 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Enable CodeGen for v_pk_fma_bf16 (PR #152578)
rampitec wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/152578?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#152578** https://app.graphite.dev/github/pr/llvm/llvm-project/152578?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/152578?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#152573** https://app.graphite.dev/github/pr/llvm/llvm-project/152573?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/152578 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Enable CodeGen for v_pk_fma_bf16 (PR #152578)
https://github.com/rampitec created https://github.com/llvm/llvm-project/pull/152578 None >From 6a9971d7cadb2dcc0169f02f92bd3f1eafb65635 Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Thu, 7 Aug 2025 12:11:17 -0700 Subject: [PATCH] [AMDGPU] Enable CodeGen for v_pk_fma_bf16 --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp |1 + llvm/test/CodeGen/AMDGPU/bf16-math.ll | 43 +- llvm/test/CodeGen/AMDGPU/bf16.ll | 1139 +++-- 3 files changed, 392 insertions(+), 791 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index 8f44c03d95b43..fd1be72ce6d82 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -6106,6 +6106,7 @@ bool SITargetLowering::isFMAFasterThanFMulAndFAdd(const MachineFunction &MF, case MVT::f64: return true; case MVT::f16: + case MVT::bf16: return Subtarget->has16BitInsts() && !denormalModeIsFlushAllF64F16(MF); default: break; diff --git a/llvm/test/CodeGen/AMDGPU/bf16-math.ll b/llvm/test/CodeGen/AMDGPU/bf16-math.ll index 682b3b4d57209..3a82f848f06a5 100644 --- a/llvm/test/CodeGen/AMDGPU/bf16-math.ll +++ b/llvm/test/CodeGen/AMDGPU/bf16-math.ll @@ -370,6 +370,9 @@ define amdgpu_ps bfloat @test_clamp_bf16_folding(bfloat %src) { ; GCN: ; %bb.0: ; GCN-NEXT:v_exp_bf16_e64 v0, v0 clamp ; GCN-NEXT:; return to shader part epilog + + + %exp = call bfloat @llvm.exp2.bf16(bfloat %src) %max = call bfloat @llvm.maxnum.bf16(bfloat %exp, bfloat 0.0) %clamp = call bfloat @llvm.minnum.bf16(bfloat %max, bfloat 1.0) @@ -381,6 +384,9 @@ define amdgpu_ps float @test_clamp_v2bf16_folding(<2 x bfloat> %src0, <2 x bfloa ; GCN: ; %bb.0: ; GCN-NEXT:v_pk_mul_bf16 v0, v0, v1 clamp ; GCN-NEXT:; return to shader part epilog + + + %mul = fmul <2 x bfloat> %src0, %src1 %max = call <2 x bfloat> @llvm.maxnum.v2bf16(<2 x bfloat> %mul, <2 x bfloat> ) %clamp = call <2 x bfloat> @llvm.minnum.v2bf16(<2 x bfloat> %max, <2 x bfloat> ) @@ -391,11 +397,12 @@ define amdgpu_ps float @test_clamp_v2bf16_folding(<2 x bfloat> %src0, <2 x bfloa define amdgpu_ps void @v_test_mul_add_v2bf16_vvv(ptr addrspace(1) %out, <2 x bfloat> %a, <2 x bfloat> %b, <2 x bfloat> %c) { ; GCN-LABEL: v_test_mul_add_v2bf16_vvv: ; GCN: ; %bb.0: -; GCN-NEXT:v_pk_mul_bf16 v2, v2, v3 -; GCN-NEXT:s_delay_alu instid0(VALU_DEP_1) -; GCN-NEXT:v_pk_add_bf16 v2, v2, v4 +; GCN-NEXT:v_pk_fma_bf16 v2, v2, v3, v4 ; GCN-NEXT:global_store_b32 v[0:1], v2, off ; GCN-NEXT:s_endpgm + + + %mul = fmul contract <2 x bfloat> %a, %b %add = fadd contract <2 x bfloat> %mul, %c store <2 x bfloat> %add, ptr addrspace(1) %out @@ -405,11 +412,12 @@ define amdgpu_ps void @v_test_mul_add_v2bf16_vvv(ptr addrspace(1) %out, <2 x bfl define amdgpu_ps void @v_test_mul_add_v2bf16_vss(ptr addrspace(1) %out, <2 x bfloat> %a, <2 x bfloat> inreg %b, <2 x bfloat> inreg %c) { ; GCN-LABEL: v_test_mul_add_v2bf16_vss: ; GCN: ; %bb.0: -; GCN-NEXT:v_pk_mul_bf16 v2, v2, s0 -; GCN-NEXT:s_delay_alu instid0(VALU_DEP_1) -; GCN-NEXT:v_pk_add_bf16 v2, v2, s1 +; GCN-NEXT:v_pk_fma_bf16 v2, v2, s0, s1 ; GCN-NEXT:global_store_b32 v[0:1], v2, off ; GCN-NEXT:s_endpgm + + + %mul = fmul contract <2 x bfloat> %a, %b %add = fadd contract <2 x bfloat> %mul, %c store <2 x bfloat> %add, ptr addrspace(1) %out @@ -419,11 +427,14 @@ define amdgpu_ps void @v_test_mul_add_v2bf16_vss(ptr addrspace(1) %out, <2 x bfl define amdgpu_ps void @v_test_mul_add_v2bf16_sss(ptr addrspace(1) %out, <2 x bfloat> inreg %a, <2 x bfloat> inreg %b, <2 x bfloat> inreg %c) { ; GCN-LABEL: v_test_mul_add_v2bf16_sss: ; GCN: ; %bb.0: -; GCN-NEXT:v_pk_mul_bf16 v2, s0, s1 +; GCN-NEXT:v_mov_b32_e32 v2, s2 ; GCN-NEXT:s_delay_alu instid0(VALU_DEP_1) -; GCN-NEXT:v_pk_add_bf16 v2, v2, s2 +; GCN-NEXT:v_pk_fma_bf16 v2, s0, s1, v2 ; GCN-NEXT:global_store_b32 v[0:1], v2, off ; GCN-NEXT:s_endpgm + + + %mul = fmul contract <2 x bfloat> %a, %b %add = fadd contract <2 x bfloat> %mul, %c store <2 x bfloat> %add, ptr addrspace(1) %out @@ -433,11 +444,12 @@ define amdgpu_ps void @v_test_mul_add_v2bf16_sss(ptr addrspace(1) %out, <2 x bfl define amdgpu_ps void @v_test_mul_add_v2bf16_vsc(ptr addrspace(1) %out, <2 x bfloat> %a, <2 x bfloat> inreg %b) { ; GCN-LABEL: v_test_mul_add_v2bf16_vsc: ; GCN: ; %bb.0: -; GCN-NEXT:v_pk_mul_bf16 v2, v2, s0 -; GCN-NEXT:s_delay_alu instid0(VALU_DEP_1) -; GCN-NEXT:v_pk_add_bf16 v2, v2, 0.5 op_sel_hi:[1,0] +; GCN-NEXT:v_pk_fma_bf16 v2, v2, s0, 0.5 op_sel_hi:[1,1,0] ; GCN-NEXT:global_store_b32 v[0:1], v2, off ; GCN-NEXT:s_endpgm + + + %mul = fmul contract <2 x bfloat> %a, %b %add = fadd contract <2 x bfloat> %mul, store <2 x bfloat> %add, ptr addrspace(1) %out @@ -447,11 +459,14 @@ define amdgpu_ps void @v_test_mul_add_v2bf1
[llvm-branch-commits] [llvm] [AMDGPU] Enable CodeGen for v_pk_fma_bf16 (PR #152578)
https://github.com/rampitec ready_for_review https://github.com/llvm/llvm-project/pull/152578 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang][PAC] Fix PAC codegen for final class dynamic_cast optimization (#152227) (PR #152271)
https://github.com/ojhunt milestoned https://github.com/llvm/llvm-project/pull/152271 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang][PAC] Fix PAC codegen for final class dynamic_cast optimization (#152227) (PR #152271)
ojhunt wrote: /cherry-pick 726847829553079a13b1b7104f2c2db9dcda9c1d https://github.com/llvm/llvm-project/pull/152271 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang][PAC] Fix PAC codegen for final class dynamic_cast optimization (#152227) (PR #152271)
https://github.com/ojhunt closed https://github.com/llvm/llvm-project/pull/152271 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/21.x: [clang][PAC] Fix PAC codegen for final class dynamic_cast optimization (#152227) (PR #152586)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/152586 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang][PAC] Fix PAC codegen for final class dynamic_cast optimization (#152227) (PR #152271)
llvmbot wrote: /pull-request llvm/llvm-project#152586 https://github.com/llvm/llvm-project/pull/152271 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/21.x: [clang][PAC] Fix PAC codegen for final class dynamic_cast optimization (#152227) (PR #152586)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/152586 Backport 726847829553079a13b1b7104f2c2db9dcda9c1d Requested by: @ojhunt >From 789c9330fa0195dc5f9cdada51ae0f187197d562 Mon Sep 17 00:00:00 2001 From: Oliver Hunt Date: Tue, 5 Aug 2025 17:41:55 -0700 Subject: [PATCH] [clang][PAC] Fix PAC codegen for final class dynamic_cast optimization (#152227) The codegen for the final class dynamic_cast optimization fails to consider pointer authentication. This change resolves this be simply disabling the optimization when pointer authentication enabled. (cherry picked from commit 726847829553079a13b1b7104f2c2db9dcda9c1d) --- clang/lib/CodeGen/CGExprCXX.cpp | 3 ++- clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp | 1 + 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/clang/lib/CodeGen/CGExprCXX.cpp b/clang/lib/CodeGen/CGExprCXX.cpp index 359e30cb8f5cd..912b1d72c7e23 100644 --- a/clang/lib/CodeGen/CGExprCXX.cpp +++ b/clang/lib/CodeGen/CGExprCXX.cpp @@ -2313,7 +2313,8 @@ llvm::Value *CodeGenFunction::EmitDynamicCast(Address ThisAddr, bool IsExact = !IsDynamicCastToVoid && CGM.getCodeGenOpts().OptimizationLevel > 0 && DestRecordTy->getAsCXXRecordDecl()->isEffectivelyFinal() && - CGM.getCXXABI().shouldEmitExactDynamicCast(DestRecordTy); + CGM.getCXXABI().shouldEmitExactDynamicCast(DestRecordTy) && + !getLangOpts().PointerAuthCalls; // C++ [expr.dynamic.cast]p4: // If the value of v is a null pointer value in the pointer case, the result diff --git a/clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp b/clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp index 9a8ce1997a7f9..19c2a9bd0497e 100644 --- a/clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp +++ b/clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp @@ -3,6 +3,7 @@ // RUN: %clang_cc1 -I%S %s -triple x86_64-apple-darwin10 -O1 -fvisibility=hidden -emit-llvm -std=c++11 -o - | FileCheck %s --check-prefixes=CHECK,INEXACT // RUN: %clang_cc1 -I%S %s -triple x86_64-apple-darwin10 -O1 -fapple-kext -emit-llvm -std=c++11 -o - | FileCheck %s --check-prefixes=CHECK,INEXACT // RUN: %clang_cc1 -I%S %s -triple x86_64-apple-darwin10 -O1 -fno-assume-unique-vtables -emit-llvm -std=c++11 -o - | FileCheck %s --check-prefixes=CHECK,INEXACT +// RUN: %clang_cc1 -I%S %s -triple arm64e-apple-darwin10 -O1 -fptrauth-calls -emit-llvm -std=c++11 -o - | FileCheck %s --check-prefixes=CHECK,INEXACT struct A { virtual ~A(); }; struct B final : A { }; ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/21.x: [clang][PAC] Fix PAC codegen for final class dynamic_cast optimization (#152227) (PR #152586)
llvmbot wrote: @llvm/pr-subscribers-clang-codegen Author: None (llvmbot) Changes Backport 726847829553079a13b1b7104f2c2db9dcda9c1d Requested by: @ojhunt --- Full diff: https://github.com/llvm/llvm-project/pull/152586.diff 2 Files Affected: - (modified) clang/lib/CodeGen/CGExprCXX.cpp (+2-1) - (modified) clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp (+1) ``diff diff --git a/clang/lib/CodeGen/CGExprCXX.cpp b/clang/lib/CodeGen/CGExprCXX.cpp index 359e30cb8f5cd..912b1d72c7e23 100644 --- a/clang/lib/CodeGen/CGExprCXX.cpp +++ b/clang/lib/CodeGen/CGExprCXX.cpp @@ -2313,7 +2313,8 @@ llvm::Value *CodeGenFunction::EmitDynamicCast(Address ThisAddr, bool IsExact = !IsDynamicCastToVoid && CGM.getCodeGenOpts().OptimizationLevel > 0 && DestRecordTy->getAsCXXRecordDecl()->isEffectivelyFinal() && - CGM.getCXXABI().shouldEmitExactDynamicCast(DestRecordTy); + CGM.getCXXABI().shouldEmitExactDynamicCast(DestRecordTy) && + !getLangOpts().PointerAuthCalls; // C++ [expr.dynamic.cast]p4: // If the value of v is a null pointer value in the pointer case, the result diff --git a/clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp b/clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp index 9a8ce1997a7f9..19c2a9bd0497e 100644 --- a/clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp +++ b/clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp @@ -3,6 +3,7 @@ // RUN: %clang_cc1 -I%S %s -triple x86_64-apple-darwin10 -O1 -fvisibility=hidden -emit-llvm -std=c++11 -o - | FileCheck %s --check-prefixes=CHECK,INEXACT // RUN: %clang_cc1 -I%S %s -triple x86_64-apple-darwin10 -O1 -fapple-kext -emit-llvm -std=c++11 -o - | FileCheck %s --check-prefixes=CHECK,INEXACT // RUN: %clang_cc1 -I%S %s -triple x86_64-apple-darwin10 -O1 -fno-assume-unique-vtables -emit-llvm -std=c++11 -o - | FileCheck %s --check-prefixes=CHECK,INEXACT +// RUN: %clang_cc1 -I%S %s -triple arm64e-apple-darwin10 -O1 -fptrauth-calls -emit-llvm -std=c++11 -o - | FileCheck %s --check-prefixes=CHECK,INEXACT struct A { virtual ~A(); }; struct B final : A { }; `` https://github.com/llvm/llvm-project/pull/152586 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Adjust hard clause rules for gfx1250 (PR #152592)
https://github.com/changpeng approved this pull request. https://github.com/llvm/llvm-project/pull/152592 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [HLSL] Add implicit binding attribute to resource arrays (PR #152452)
@@ -342,6 +346,17 @@ static bool isResourceRecordTypeOrArrayOf(VarDecl *VD) { return Ty->isHLSLResourceRecord() || Ty->isHLSLResourceRecordArray(); } +static const HLSLAttributedResourceType * +getResourceArrayHandleType(VarDecl *VD) { + assert(VD->getType()->isHLSLResourceRecordArray() && + "expected array of resource records"); + const Type *Ty = VD->getType()->getUnqualifiedDesugaredType(); + while (const ConstantArrayType *CAT = dyn_cast(Ty)) { +Ty = CAT->getArrayElementTypeNoTypeQual()->getUnqualifiedDesugaredType(); + } V-FEXrt wrote: Ahh, that makes sense! https://github.com/llvm/llvm-project/pull/152452 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [IR] Introduce the `ptrtoaddr` instruction (PR #139357)
@@ -3532,6 +3533,28 @@ void Verifier::visitFPToSIInst(FPToSIInst &I) { visitInstruction(I); } +void Verifier::visitPtrToAddrInst(PtrToAddrInst &I) { + // Get the source and destination types + Type *SrcTy = I.getOperand(0)->getType(); + Type *DestTy = I.getType(); + + Check(SrcTy->isPtrOrPtrVectorTy(), "PtrToAddr source must be pointer", &I); + Check(DestTy->isIntOrIntVectorTy(), "PtrToAddr result must be integral", &I); + Check(SrcTy->isVectorTy() == DestTy->isVectorTy(), "PtrToAddr type mismatch", +&I); + + if (SrcTy->isVectorTy()) { +auto *VSrc = cast(SrcTy); +auto *VDest = cast(DestTy); +Check(VSrc->getElementCount() == VDest->getElementCount(), + "PtrToAddr vector width mismatch", &I); + } + + Type *AddrTy = DL.getAddressType(SrcTy); + Check(AddrTy == DestTy, "PtrToAddr result must be address width", &I); + visitInstruction(I); +} arichardson wrote: I added some basic checks, but noticed we don't check ConstantAggregate values, so I'll deal with that in a follow up. https://github.com/llvm/llvm-project/pull/139357 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [IR] Introduce the `ptrtoaddr` instruction (PR #139357)
https://github.com/arichardson updated https://github.com/llvm/llvm-project/pull/139357 >From 25dc175562349410f161ef0e80246301d9a7ba79 Mon Sep 17 00:00:00 2001 From: Alex Richardson Date: Fri, 9 May 2025 22:43:37 -0700 Subject: [PATCH] fix docs build Created using spr 1.3.6-beta.1 --- llvm/docs/LangRef.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst index 2d18d0d97aaee..38be6918ff73c 100644 --- a/llvm/docs/LangRef.rst +++ b/llvm/docs/LangRef.rst @@ -12435,7 +12435,7 @@ Example: .. _i_ptrtoaddr: '``ptrtoaddr .. to``' Instruction - +^ Syntax: """ ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [IR] Introduce the `ptrtoaddr` instruction (PR #139357)
https://github.com/arichardson updated https://github.com/llvm/llvm-project/pull/139357 >From 25dc175562349410f161ef0e80246301d9a7ba79 Mon Sep 17 00:00:00 2001 From: Alex Richardson Date: Fri, 9 May 2025 22:43:37 -0700 Subject: [PATCH] fix docs build Created using spr 1.3.6-beta.1 --- llvm/docs/LangRef.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst index 2d18d0d97aaee..38be6918ff73c 100644 --- a/llvm/docs/LangRef.rst +++ b/llvm/docs/LangRef.rst @@ -12435,7 +12435,7 @@ Example: .. _i_ptrtoaddr: '``ptrtoaddr .. to``' Instruction - +^ Syntax: """ ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [IR] Introduce the `ptrtoaddr` instruction (PR #139357)
@@ -3532,6 +3533,28 @@ void Verifier::visitFPToSIInst(FPToSIInst &I) { visitInstruction(I); } +void Verifier::visitPtrToAddrInst(PtrToAddrInst &I) { + // Get the source and destination types + Type *SrcTy = I.getOperand(0)->getType(); + Type *DestTy = I.getType(); + + Check(SrcTy->isPtrOrPtrVectorTy(), "PtrToAddr source must be pointer", &I); + Check(DestTy->isIntOrIntVectorTy(), "PtrToAddr result must be integral", &I); + Check(SrcTy->isVectorTy() == DestTy->isVectorTy(), "PtrToAddr type mismatch", +&I); + + if (SrcTy->isVectorTy()) { +auto *VSrc = cast(SrcTy); +auto *VDest = cast(DestTy); +Check(VSrc->getElementCount() == VDest->getElementCount(), + "PtrToAddr vector width mismatch", &I); arichardson wrote: Fixed and also changed ptrtoint and inttoptr https://github.com/llvm/llvm-project/pull/139357 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [HLSL] Add implicit binding attribute to resource arrays (PR #152452)
@@ -71,6 +71,10 @@ static RegisterType getRegisterType(ResourceClass RC) { llvm_unreachable("unexpected ResourceClass value"); } +static RegisterType getRegisterType(const HLSLAttributedResourceType *ResTy) { bob80905 wrote: You might consider renaming one of these functions, maybe add a "FromResourceType" to this new one. https://github.com/llvm/llvm-project/pull/152452 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [HLSL] Add implicit binding attribute to resource arrays (PR #152452)
@@ -34,6 +34,10 @@ RWBuffer UAV1 : register(u2), UAV2 : register(u4); // CHECK: HLSLResourceBindingAttr {{.*}} "" "space5" RWBuffer UAV3 : register(space5); +// CHECK: VarDecl {{.*}} UAV_Array 'RWBuffer[10]' bob80905 wrote: Should we add a test case where HLSLVkBindingAttr already exists (I presume an explicit binding case), we check for HLSLVkBindingAttr, and check NOT that the new attr is added? https://github.com/llvm/llvm-project/pull/152452 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [HLSL] Add implicit binding attribute to resource arrays (PR #152452)
@@ -342,6 +346,17 @@ static bool isResourceRecordTypeOrArrayOf(VarDecl *VD) { return Ty->isHLSLResourceRecord() || Ty->isHLSLResourceRecordArray(); } +static const HLSLAttributedResourceType * +getResourceArrayHandleType(VarDecl *VD) { + assert(VD->getType()->isHLSLResourceRecordArray() && + "expected array of resource records"); + const Type *Ty = VD->getType()->getUnqualifiedDesugaredType(); + while (const ConstantArrayType *CAT = dyn_cast(Ty)) { +Ty = CAT->getArrayElementTypeNoTypeQual()->getUnqualifiedDesugaredType(); + } V-FEXrt wrote: This is grabbing the last value in the CAT? also nit: ```suggestion while (const ConstantArrayType *CAT = dyn_cast(Ty)) Ty = CAT->getArrayElementTypeNoTypeQual()->getUnqualifiedDesugaredType(); ``` https://github.com/llvm/llvm-project/pull/152452 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][ISel] Select constructive EXT_ZZZI pseudo instruction (PR #152554)
https://github.com/gbossu edited https://github.com/llvm/llvm-project/pull/152554 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Adjust hard clause rules for gfx1250 (PR #152592)
https://github.com/rampitec created https://github.com/llvm/llvm-project/pull/152592 Change from GFX12: Relax S_CLAUSE rules to all all non-flat memory types in the same clause, and all Flat types in the same. For VMEM/FLAT clause types now look like: - Non-Flat (load, store, atomic): buffer, global, scratch, TDM, Async - Flat: load, store, atomic >From 7800b5d664f487df9fddbb085d9578812f598ec0 Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Thu, 7 Aug 2025 13:34:44 -0700 Subject: [PATCH] [AMDGPU] Adjust hard clause rules for gfx1250 Change from GFX12: Relax S_CLAUSE rules to all all non-flat memory types in the same clause, and all Flat types in the same. For VMEM/FLAT clause types now look like: - Non-Flat (load, store, atomic): buffer, global, scratch, TDM, Async - Flat: load, store, atomic --- .../lib/Target/AMDGPU/SIInsertHardClauses.cpp | 6 +- .../test/CodeGen/AMDGPU/flat-saddr-atomics.ll | 4 + llvm/test/CodeGen/AMDGPU/global-load-xcnt.ll | 5 +- .../CodeGen/AMDGPU/hard-clauses-gfx1250.mir | 608 +- .../AMDGPU/llvm.amdgcn.struct.buffer.store.ll | 1 + 5 files changed, 617 insertions(+), 7 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp b/llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp index d8fe8505bc722..0a68512668c7d 100644 --- a/llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp +++ b/llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp @@ -51,7 +51,7 @@ static cl::opt namespace { enum HardClauseType { - // For GFX10: + // For GFX10 and GFX1250: // Texture, buffer, global or scratch memory instructions. HARDCLAUSE_VMEM, @@ -102,7 +102,8 @@ class SIInsertHardClauses { HardClauseType getHardClauseType(const MachineInstr &MI) { if (MI.mayLoad() || (MI.mayStore() && ST->shouldClusterStores())) { - if (ST->getGeneration() == AMDGPUSubtarget::GFX10) { + if (ST->getGeneration() == AMDGPUSubtarget::GFX10 || + ST->hasGFX1250Insts()) { if ((SIInstrInfo::isVMEM(MI) && !SIInstrInfo::isFLAT(MI)) || SIInstrInfo::isSegmentSpecificFLAT(MI)) { if (ST->hasNSAClauseBug()) { @@ -115,7 +116,6 @@ class SIInsertHardClauses { if (SIInstrInfo::isFLAT(MI)) return HARDCLAUSE_FLAT; } else { -assert(ST->getGeneration() >= AMDGPUSubtarget::GFX11); if (SIInstrInfo::isMIMG(MI)) { const AMDGPU::MIMGInfo *Info = AMDGPU::getMIMGInfo(MI.getOpcode()); const AMDGPU::MIMGBaseOpcodeInfo *BaseInfo = diff --git a/llvm/test/CodeGen/AMDGPU/flat-saddr-atomics.ll b/llvm/test/CodeGen/AMDGPU/flat-saddr-atomics.ll index 7d36c9f07ea73..004d3c0c1cf53 100644 --- a/llvm/test/CodeGen/AMDGPU/flat-saddr-atomics.ll +++ b/llvm/test/CodeGen/AMDGPU/flat-saddr-atomics.ll @@ -284,6 +284,7 @@ define amdgpu_ps <2 x float> @flat_xchg_saddr_i64_rtn(ptr inreg %sbase, i32 %vof ; GFX1250-SDAG-NEXT:v_subrev_nc_u32_e32 v0, s1, v4 ; GFX1250-SDAG-NEXT:s_delay_alu instid0(VALU_DEP_1) ; GFX1250-SDAG-NEXT:v_cndmask_b32_e32 v4, -1, v0, vcc_lo +; GFX1250-SDAG-NEXT:s_clause 0x1 ; GFX1250-SDAG-NEXT:scratch_load_b64 v[0:1], v4, off ; GFX1250-SDAG-NEXT:scratch_store_b64 v4, v[2:3], off scope:SCOPE_SE ; GFX1250-SDAG-NEXT:s_wait_xcnt 0x0 @@ -329,6 +330,7 @@ define amdgpu_ps <2 x float> @flat_xchg_saddr_i64_rtn(ptr inreg %sbase, i32 %vof ; GFX1250-GISEL-NEXT:v_subrev_nc_u32_e32 v0, s1, v6 ; GFX1250-GISEL-NEXT:s_delay_alu instid0(VALU_DEP_1) ; GFX1250-GISEL-NEXT:v_cndmask_b32_e32 v2, -1, v0, vcc_lo +; GFX1250-GISEL-NEXT:s_clause 0x1 ; GFX1250-GISEL-NEXT:scratch_load_b64 v[0:1], v2, off ; GFX1250-GISEL-NEXT:scratch_store_b64 v2, v[4:5], off scope:SCOPE_SE ; GFX1250-GISEL-NEXT:s_wait_xcnt 0x0 @@ -382,6 +384,7 @@ define amdgpu_ps <2 x float> @flat_xchg_saddr_i64_rtn_neg128(ptr inreg %sbase, i ; GFX1250-SDAG-NEXT:v_subrev_nc_u32_e32 v0, s1, v4 ; GFX1250-SDAG-NEXT:s_delay_alu instid0(VALU_DEP_1) ; GFX1250-SDAG-NEXT:v_cndmask_b32_e32 v4, -1, v0, vcc_lo +; GFX1250-SDAG-NEXT:s_clause 0x1 ; GFX1250-SDAG-NEXT:scratch_load_b64 v[0:1], v4, off ; GFX1250-SDAG-NEXT:scratch_store_b64 v4, v[2:3], off scope:SCOPE_SE ; GFX1250-SDAG-NEXT:s_wait_xcnt 0x0 @@ -430,6 +433,7 @@ define amdgpu_ps <2 x float> @flat_xchg_saddr_i64_rtn_neg128(ptr inreg %sbase, i ; GFX1250-GISEL-NEXT:v_subrev_nc_u32_e32 v0, s1, v6 ; GFX1250-GISEL-NEXT:s_delay_alu instid0(VALU_DEP_1) ; GFX1250-GISEL-NEXT:v_cndmask_b32_e32 v2, -1, v0, vcc_lo +; GFX1250-GISEL-NEXT:s_clause 0x1 ; GFX1250-GISEL-NEXT:scratch_load_b64 v[0:1], v2, off ; GFX1250-GISEL-NEXT:scratch_store_b64 v2, v[4:5], off scope:SCOPE_SE ; GFX1250-GISEL-NEXT:s_wait_xcnt 0x0 diff --git a/llvm/test/CodeGen/AMDGPU/global-load-xcnt.ll b/llvm/test/CodeGen/AMDGPU/global-load-xcnt.ll index 3a898a9214461..f0db321d3931a 100644 --- a/llvm/test/CodeGen/AMDGPU/global-load-xcnt.ll +++ b/llvm/test/CodeGen/AMDGPU/global-load-xcnt.ll @@ -244,
[llvm-branch-commits] [llvm] [AMDGPU] Adjust hard clause rules for gfx1250 (PR #152592)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: Stanislav Mekhanoshin (rampitec) Changes Change from GFX12: Relax S_CLAUSE rules to all all non-flat memory types in the same clause, and all Flat types in the same. For VMEM/FLAT clause types now look like: - Non-Flat (load, store, atomic): buffer, global, scratch, TDM, Async - Flat: load, store, atomic --- Patch is 61.28 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/152592.diff 5 Files Affected: - (modified) llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp (+3-3) - (modified) llvm/test/CodeGen/AMDGPU/flat-saddr-atomics.ll (+4) - (modified) llvm/test/CodeGen/AMDGPU/global-load-xcnt.ll (+3-2) - (modified) llvm/test/CodeGen/AMDGPU/hard-clauses-gfx1250.mir (+606-2) - (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.struct.buffer.store.ll (+1) ``diff diff --git a/llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp b/llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp index d8fe8505bc722..0a68512668c7d 100644 --- a/llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp +++ b/llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp @@ -51,7 +51,7 @@ static cl::opt namespace { enum HardClauseType { - // For GFX10: + // For GFX10 and GFX1250: // Texture, buffer, global or scratch memory instructions. HARDCLAUSE_VMEM, @@ -102,7 +102,8 @@ class SIInsertHardClauses { HardClauseType getHardClauseType(const MachineInstr &MI) { if (MI.mayLoad() || (MI.mayStore() && ST->shouldClusterStores())) { - if (ST->getGeneration() == AMDGPUSubtarget::GFX10) { + if (ST->getGeneration() == AMDGPUSubtarget::GFX10 || + ST->hasGFX1250Insts()) { if ((SIInstrInfo::isVMEM(MI) && !SIInstrInfo::isFLAT(MI)) || SIInstrInfo::isSegmentSpecificFLAT(MI)) { if (ST->hasNSAClauseBug()) { @@ -115,7 +116,6 @@ class SIInsertHardClauses { if (SIInstrInfo::isFLAT(MI)) return HARDCLAUSE_FLAT; } else { -assert(ST->getGeneration() >= AMDGPUSubtarget::GFX11); if (SIInstrInfo::isMIMG(MI)) { const AMDGPU::MIMGInfo *Info = AMDGPU::getMIMGInfo(MI.getOpcode()); const AMDGPU::MIMGBaseOpcodeInfo *BaseInfo = diff --git a/llvm/test/CodeGen/AMDGPU/flat-saddr-atomics.ll b/llvm/test/CodeGen/AMDGPU/flat-saddr-atomics.ll index 7d36c9f07ea73..004d3c0c1cf53 100644 --- a/llvm/test/CodeGen/AMDGPU/flat-saddr-atomics.ll +++ b/llvm/test/CodeGen/AMDGPU/flat-saddr-atomics.ll @@ -284,6 +284,7 @@ define amdgpu_ps <2 x float> @flat_xchg_saddr_i64_rtn(ptr inreg %sbase, i32 %vof ; GFX1250-SDAG-NEXT:v_subrev_nc_u32_e32 v0, s1, v4 ; GFX1250-SDAG-NEXT:s_delay_alu instid0(VALU_DEP_1) ; GFX1250-SDAG-NEXT:v_cndmask_b32_e32 v4, -1, v0, vcc_lo +; GFX1250-SDAG-NEXT:s_clause 0x1 ; GFX1250-SDAG-NEXT:scratch_load_b64 v[0:1], v4, off ; GFX1250-SDAG-NEXT:scratch_store_b64 v4, v[2:3], off scope:SCOPE_SE ; GFX1250-SDAG-NEXT:s_wait_xcnt 0x0 @@ -329,6 +330,7 @@ define amdgpu_ps <2 x float> @flat_xchg_saddr_i64_rtn(ptr inreg %sbase, i32 %vof ; GFX1250-GISEL-NEXT:v_subrev_nc_u32_e32 v0, s1, v6 ; GFX1250-GISEL-NEXT:s_delay_alu instid0(VALU_DEP_1) ; GFX1250-GISEL-NEXT:v_cndmask_b32_e32 v2, -1, v0, vcc_lo +; GFX1250-GISEL-NEXT:s_clause 0x1 ; GFX1250-GISEL-NEXT:scratch_load_b64 v[0:1], v2, off ; GFX1250-GISEL-NEXT:scratch_store_b64 v2, v[4:5], off scope:SCOPE_SE ; GFX1250-GISEL-NEXT:s_wait_xcnt 0x0 @@ -382,6 +384,7 @@ define amdgpu_ps <2 x float> @flat_xchg_saddr_i64_rtn_neg128(ptr inreg %sbase, i ; GFX1250-SDAG-NEXT:v_subrev_nc_u32_e32 v0, s1, v4 ; GFX1250-SDAG-NEXT:s_delay_alu instid0(VALU_DEP_1) ; GFX1250-SDAG-NEXT:v_cndmask_b32_e32 v4, -1, v0, vcc_lo +; GFX1250-SDAG-NEXT:s_clause 0x1 ; GFX1250-SDAG-NEXT:scratch_load_b64 v[0:1], v4, off ; GFX1250-SDAG-NEXT:scratch_store_b64 v4, v[2:3], off scope:SCOPE_SE ; GFX1250-SDAG-NEXT:s_wait_xcnt 0x0 @@ -430,6 +433,7 @@ define amdgpu_ps <2 x float> @flat_xchg_saddr_i64_rtn_neg128(ptr inreg %sbase, i ; GFX1250-GISEL-NEXT:v_subrev_nc_u32_e32 v0, s1, v6 ; GFX1250-GISEL-NEXT:s_delay_alu instid0(VALU_DEP_1) ; GFX1250-GISEL-NEXT:v_cndmask_b32_e32 v2, -1, v0, vcc_lo +; GFX1250-GISEL-NEXT:s_clause 0x1 ; GFX1250-GISEL-NEXT:scratch_load_b64 v[0:1], v2, off ; GFX1250-GISEL-NEXT:scratch_store_b64 v2, v[4:5], off scope:SCOPE_SE ; GFX1250-GISEL-NEXT:s_wait_xcnt 0x0 diff --git a/llvm/test/CodeGen/AMDGPU/global-load-xcnt.ll b/llvm/test/CodeGen/AMDGPU/global-load-xcnt.ll index 3a898a9214461..f0db321d3931a 100644 --- a/llvm/test/CodeGen/AMDGPU/global-load-xcnt.ll +++ b/llvm/test/CodeGen/AMDGPU/global-load-xcnt.ll @@ -244,8 +244,9 @@ define i32 @test_v64i32_load_store(ptr addrspace(1) %ptr, i32 %idx, ptr addrspac ; GCN-GISEL-NEXT:global_load_b128 v[60:63], v[0:1], off offset:16 ; GCN-GISEL-NEXT:global_load_b128 v[0:3], v[0:1], off offset:240 ; GCN-GISEL-NEXT:s_wait_loadcnt 0x0 -; GCN-GI
[llvm-branch-commits] [clang] release/21.x: [clang][PAC] Fix PAC codegen for final class dynamic_cast optimization (#152227) (PR #152587)
https://github.com/asl approved this pull request. https://github.com/llvm/llvm-project/pull/152587 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [HLSL] Global resource arrays element access (PR #152454)
@@ -84,6 +84,124 @@ void addRootSignature(llvm::dxbc::RootSignatureVersion RootSigVer, RootSignatureValMD->addOperand(MDVals); } +// If the specified expr is a simple decay from an array to pointer, +// return the array subexpression. Otherwise, return nullptr. +static const Expr *getSubExprFromArrayDecayOperand(const Expr *E) { + const auto *CE = dyn_cast(E); + if (!CE || CE->getCastKind() != CK_ArrayToPointerDecay) +return nullptr; + return CE->getSubExpr(); +} + +// Find array variable declaration from nested array subscript AST nodes +static const ValueDecl *getArrayDecl(const ArraySubscriptExpr *ASE) { + const Expr *E = nullptr; + while (ASE != nullptr) { +E = getSubExprFromArrayDecayOperand(ASE->getBase()); +if (!E) + return nullptr; +ASE = dyn_cast(E); + } + if (const DeclRefExpr *DRE = dyn_cast_or_null(E)) +return DRE->getDecl(); + return nullptr; +} + +// Get the total size of the array, or -1 if the array is unbounded. +static int getTotalArraySize(const clang::Type *Ty) { + assert(Ty->isArrayType() && "expected array type"); + if (Ty->isIncompleteArrayType()) +return -1; + int Size = 1; + while (const auto *CAT = dyn_cast(Ty)) { +Size *= CAT->getSExtSize(); +Ty = CAT->getArrayElementTypeNoTypeQual(); + } + return Size; +} + +// Find constructor decl for a specific resource record type and binding +// (implicit vs. explicit). The constructor has 6 parameters. +// For explicit binding the signature is: +// void(unsigned, unsigned, int, unsigned, const char *). +// For implicit binding the signature is: +// void(unsigned, int, unsigned, unsigned, const char *). +static CXXConstructorDecl *findResourceConstructorDecl(ASTContext &AST, + QualType ResTy, + bool ExplicitBinding) { + SmallVector ExpParmTypes = { + AST.UnsignedIntTy, AST.UnsignedIntTy, AST.UnsignedIntTy, + AST.UnsignedIntTy, AST.getPointerType(AST.CharTy.withConst())}; + ExpParmTypes[ExplicitBinding ? 2 : 1] = AST.IntTy; + + CXXRecordDecl *ResDecl = ResTy->getAsCXXRecordDecl(); + for (auto *Ctor : ResDecl->ctors()) { +if (Ctor->getNumParams() != ExpParmTypes.size()) + continue; +ParmVarDecl **ParmIt = Ctor->param_begin(); +QualType *ExpTyIt = ExpParmTypes.begin(); +for (; ParmIt != Ctor->param_end() && ExpTyIt != ExpParmTypes.end(); + ++ParmIt, ++ExpTyIt) { + if ((*ParmIt)->getType() != *ExpTyIt) +break; +} +if (ParmIt == Ctor->param_end()) + return Ctor; + } + llvm_unreachable("did not find constructor for resource class"); +} + +static Value *buildNameForResource(llvm::StringRef BaseName, + CodeGenModule &CGM) { + std::string Str(BaseName); + std::string GlobalName(Str + ".str"); + return CGM.GetAddrOfConstantCString(Str, GlobalName.c_str()).getPointer(); +} + +static void createResourceCtorArgs(CodeGenModule &CGM, CXXConstructorDecl *CD, + llvm::Value *ThisPtr, llvm::Value *Range, + llvm::Value *Index, StringRef Name, + HLSLResourceBindingAttr *RBA, + HLSLVkBindingAttr *VkBinding, + CallArgList &Args) { + assert((VkBinding || RBA) && "at least one a binding attribute expected"); + + std::optional RegisterSlot; + uint32_t SpaceNo = 0; + if (VkBinding) { +RegisterSlot = VkBinding->getBinding(); +SpaceNo = VkBinding->getSet(); + } else if (RBA) { +if (RBA->hasRegisterSlot()) + RegisterSlot = RBA->getSlotNumber(); +SpaceNo = RBA->getSpaceNumber(); + } + + ASTContext &AST = CD->getASTContext(); + Value *NameStr = buildNameForResource(Name, CGM); + Value *Space = llvm::ConstantInt::get(CGM.IntTy, SpaceNo); + + Args.add(RValue::get(ThisPtr), CD->getThisType()); + if (RegisterSlot.has_value()) { alsepkow wrote: Ah, maybe this is where the order would matter? Do we want the argument ordering to match for both the explicit/implicit case? Right now they are different and that would explain the difference in ordering in the test cases that I commented on. https://github.com/llvm/llvm-project/pull/152454 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [HLSL] Global resource arrays element access (PR #152454)
https://github.com/alsepkow edited https://github.com/llvm/llvm-project/pull/152454 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [HLSL] Global resource arrays element access (PR #152454)
https://github.com/alsepkow commented: Submitting a couple comments. https://github.com/llvm/llvm-project/pull/152454 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [CI] Setup generate_report to describe ninja failures (PR #152621)
https://github.com/boomanaiden154 created https://github.com/llvm/llvm-project/pull/152621 This patch makes it so that generate_report will add information about failed build actions to the summary report. This makes it significantly easier to find compilation failures, especially given we run ninja with -k 0. This patch only does the integration into generate_report (along with testing). Actual utilization in the script is split into a separate patch to try and keep things clean. ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [lldb] [PATCH 7/7] [clang] improve NestedNameSpecifier: LLDB changes (PR #149949)
mizvekov wrote: I managed to fix that, it was some problem with using `lld` instead of the macOS linker. https://github.com/llvm/llvm-project/pull/149949 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [HLSL] Global resource arrays element access (PR #152454)
@@ -0,0 +1,59 @@ +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.6-compute -finclude-default-header \ +// RUN: -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s -check-prefixes=CHECK,DXIL +// RUN: %clang_cc1 -finclude-default-header -triple spirv-unknown-vulkan-compute \ +// RUN: -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s -check-prefixes=CHECK,SPV + +// CHECK: @[[BufA:.*]] = private unnamed_addr constant [2 x i8] c"A\00", align 1 +// CHECK: @[[BufB:.*]] = private unnamed_addr constant [2 x i8] c"B\00", align 1 +// CHECK: @[[BufC:.*]] = private unnamed_addr constant [2 x i8] c"C\00", align 1 +// CHECK: @[[BufD:.*]] = private unnamed_addr constant [2 x i8] c"D\00", align 1 + +// different explicit binding for DXIL and SPIR-V +[[vk::binding(12, 2)]] +RWBuffer A[4] : register(u10, space1); + +[[vk::binding(13)]] // SPIR-V explicit binding 13, set 0 +RWBuffer B[5]; // DXIL implicit binding in space0 + +// same explicit binding for both DXIL and SPIR-V +// (SPIR-V takes the binding from register annotation if there is no vk::binding attribute)) +RWBuffer C[3] : register(u2); + +// implicit binding for both DXIL and SPIR-V in space/set 0 +RWBuffer D[10]; + +RWStructuredBuffer Out; + +[numthreads(4,1,1)] +void main() { + // CHECK: define internal{{.*}} void @_Z4mainv() + // CHECK: %[[Tmp0:.*]] = alloca %"class.hlsl::RWBuffer + // CHECK: %[[Tmp1:.*]] = alloca %"class.hlsl::RWBuffer + // CHECK: %[[Tmp2:.*]] = alloca %"class.hlsl::RWBuffer + // CHECK: %[[Tmp3:.*]] = alloca %"class.hlsl::RWBuffer + + // Make sure A[2] is translated to a RWBuffer constructor call with range 4 and index 2 + // and DXIL explicit binding (u10, space1) + // and SPIR-V explicit binding (binding 12, set 2) + // DXIL: call void @_ZN4hlsl8RWBufferIfEC1EjjijPKc(ptr {{.*}} %[[Tmp0]], i32 noundef 10, i32 noundef 1, i32 noundef 4, i32 noundef 2, ptr noundef @[[BufA]]) + // SPV: call void @_ZN4hlsl8RWBufferIfEC1EjjijPKc(ptr {{.*}} %[[Tmp0]], i32 noundef 12, i32 noundef 2, i32 noundef 4, i32 noundef 2, ptr noundef @[[BufA]]) + + // Make sure B[3] is translated to a RWBuffer constructor call with range 5 and index 3 + // and DXIL for implicit binding in space0, order id 0 + // and SPIR-V explicit binding (binding 13, set 0) + // DXIL: call void @_ZN4hlsl8RWBufferIiEC1EjijjPKc(ptr {{.*}} %[[Tmp1]], i32 noundef 0, i32 noundef 5, i32 noundef 3, i32 noundef 0, ptr noundef @[[BufB]]) + // SPV: call void @_ZN4hlsl8RWBufferIiEC1EjjijPKc(ptr {{.*}} %[[Tmp1]], i32 noundef 13, i32 noundef 0, i32 noundef 5, i32 noundef 3, ptr noundef @[[BufB]]) + + // Make sure C[1] is translated to a RWBuffer constructor call with range 3 and index 1 + // and DXIL explicit binding (u2, space0) + // and SPIR-V explicit binding (binding 2, set 0) + // DXIL: call void @_ZN4hlsl8RWBufferIiEC1EjjijPKc(ptr {{.*}} %[[Tmp2]], i32 noundef 2, i32 noundef 0, i32 noundef 3, i32 noundef 1, ptr noundef @[[BufC]]) + // SPV: call void @_ZN4hlsl8RWBufferIiEC1EjjijPKc(ptr {{.*}} %[[Tmp2]], i32 noundef 2, i32 noundef 0, i32 noundef 3, i32 noundef 1, ptr noundef @[[BufC]]) + + // Make sure D[7] is translated to a RWBuffer constructor call with range 10 and index 7 + // and DXIL for implicit binding in space0, order id 1 + // and SPIR-V explicit binding (binding 13, set 0), order id 0 alsepkow wrote: Wouldn't this be an implicit SPIR-V binding? https://github.com/llvm/llvm-project/pull/152454 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [HLSL] Global resource arrays element access (PR #152454)
@@ -0,0 +1,59 @@ +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.6-compute -finclude-default-header \ +// RUN: -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s -check-prefixes=CHECK,DXIL +// RUN: %clang_cc1 -finclude-default-header -triple spirv-unknown-vulkan-compute \ +// RUN: -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s -check-prefixes=CHECK,SPV + +// CHECK: @[[BufA:.*]] = private unnamed_addr constant [2 x i8] c"A\00", align 1 +// CHECK: @[[BufB:.*]] = private unnamed_addr constant [2 x i8] c"B\00", align 1 +// CHECK: @[[BufC:.*]] = private unnamed_addr constant [2 x i8] c"C\00", align 1 +// CHECK: @[[BufD:.*]] = private unnamed_addr constant [2 x i8] c"D\00", align 1 + +// different explicit binding for DXIL and SPIR-V +[[vk::binding(12, 2)]] +RWBuffer A[4] : register(u10, space1); + +[[vk::binding(13)]] // SPIR-V explicit binding 13, set 0 +RWBuffer B[5]; // DXIL implicit binding in space0 + +// same explicit binding for both DXIL and SPIR-V +// (SPIR-V takes the binding from register annotation if there is no vk::binding attribute)) +RWBuffer C[3] : register(u2); + +// implicit binding for both DXIL and SPIR-V in space/set 0 +RWBuffer D[10]; + +RWStructuredBuffer Out; + +[numthreads(4,1,1)] +void main() { + // CHECK: define internal{{.*}} void @_Z4mainv() + // CHECK: %[[Tmp0:.*]] = alloca %"class.hlsl::RWBuffer + // CHECK: %[[Tmp1:.*]] = alloca %"class.hlsl::RWBuffer + // CHECK: %[[Tmp2:.*]] = alloca %"class.hlsl::RWBuffer + // CHECK: %[[Tmp3:.*]] = alloca %"class.hlsl::RWBuffer + + // Make sure A[2] is translated to a RWBuffer constructor call with range 4 and index 2 + // and DXIL explicit binding (u10, space1) + // and SPIR-V explicit binding (binding 12, set 2) + // DXIL: call void @_ZN4hlsl8RWBufferIfEC1EjjijPKc(ptr {{.*}} %[[Tmp0]], i32 noundef 10, i32 noundef 1, i32 noundef 4, i32 noundef 2, ptr noundef @[[BufA]]) + // SPV: call void @_ZN4hlsl8RWBufferIfEC1EjjijPKc(ptr {{.*}} %[[Tmp0]], i32 noundef 12, i32 noundef 2, i32 noundef 4, i32 noundef 2, ptr noundef @[[BufA]]) + + // Make sure B[3] is translated to a RWBuffer constructor call with range 5 and index 3 + // and DXIL for implicit binding in space0, order id 0 + // and SPIR-V explicit binding (binding 13, set 0) + // DXIL: call void @_ZN4hlsl8RWBufferIiEC1EjijjPKc(ptr {{.*}} %[[Tmp1]], i32 noundef 0, i32 noundef 5, i32 noundef 3, i32 noundef 0, ptr noundef @[[BufB]]) alsepkow wrote: The ordering of the operands seems inconsistent. When we're checking A on line 38 it looks like the first operand is the register (10) and the second is the space (0). But here it looks like we're being implicitly assigned register 0 and space 5. But the comment for B says implicit binding in space0. https://github.com/llvm/llvm-project/pull/152454 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [HLSL] Global resource arrays element access (PR #152454)
@@ -0,0 +1,59 @@ +// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.6-compute -finclude-default-header \ +// RUN: -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s -check-prefixes=CHECK,DXIL +// RUN: %clang_cc1 -finclude-default-header -triple spirv-unknown-vulkan-compute \ +// RUN: -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s -check-prefixes=CHECK,SPV + +// CHECK: @[[BufA:.*]] = private unnamed_addr constant [2 x i8] c"A\00", align 1 +// CHECK: @[[BufB:.*]] = private unnamed_addr constant [2 x i8] c"B\00", align 1 +// CHECK: @[[BufC:.*]] = private unnamed_addr constant [2 x i8] c"C\00", align 1 +// CHECK: @[[BufD:.*]] = private unnamed_addr constant [2 x i8] c"D\00", align 1 + +// different explicit binding for DXIL and SPIR-V +[[vk::binding(12, 2)]] +RWBuffer A[4] : register(u10, space1); + +[[vk::binding(13)]] // SPIR-V explicit binding 13, set 0 +RWBuffer B[5]; // DXIL implicit binding in space0 + +// same explicit binding for both DXIL and SPIR-V +// (SPIR-V takes the binding from register annotation if there is no vk::binding attribute)) +RWBuffer C[3] : register(u2); + +// implicit binding for both DXIL and SPIR-V in space/set 0 +RWBuffer D[10]; + +RWStructuredBuffer Out; + +[numthreads(4,1,1)] +void main() { + // CHECK: define internal{{.*}} void @_Z4mainv() + // CHECK: %[[Tmp0:.*]] = alloca %"class.hlsl::RWBuffer + // CHECK: %[[Tmp1:.*]] = alloca %"class.hlsl::RWBuffer + // CHECK: %[[Tmp2:.*]] = alloca %"class.hlsl::RWBuffer + // CHECK: %[[Tmp3:.*]] = alloca %"class.hlsl::RWBuffer + + // Make sure A[2] is translated to a RWBuffer constructor call with range 4 and index 2 + // and DXIL explicit binding (u10, space1) + // and SPIR-V explicit binding (binding 12, set 2) + // DXIL: call void @_ZN4hlsl8RWBufferIfEC1EjjijPKc(ptr {{.*}} %[[Tmp0]], i32 noundef 10, i32 noundef 1, i32 noundef 4, i32 noundef 2, ptr noundef @[[BufA]]) alsepkow wrote: Curious about the mangled name in here '@_ZN4hlsl8RWBufferIfEC1EjjijPKc' Could that change? More so I'm wondering why we have that as part of the string we're matching here. To me it seems like that would be something we don't car as much about. https://github.com/llvm/llvm-project/pull/152454 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [CI] Enable Build Failure Reporting (PR #152622)
https://github.com/boomanaiden154 created https://github.com/llvm/llvm-project/pull/152622 This patch finishes up the plumbing so that generate_test_report will dump build failures into the Github checks summary. ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits