[llvm-branch-commits] [clang] release/21.x: [Flang] Search flang_rt in clang_rt path (#151954) (PR #152458)

2025-08-07 Thread Kelvin Li via llvm-branch-commits

https://github.com/kkwli approved this pull request.

LG. Thanks.

https://github.com/llvm/llvm-project/pull/152458
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL] Global resource arrays element access (PR #152454)

2025-08-07 Thread Helena Kotas via llvm-branch-commits

https://github.com/hekota edited 
https://github.com/llvm/llvm-project/pull/152454
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL] Global resource arrays element access (PR #152454)

2025-08-07 Thread Helena Kotas via llvm-branch-commits

https://github.com/hekota edited 
https://github.com/llvm/llvm-project/pull/152454
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL] Global resource arrays element access (PR #152454)

2025-08-07 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-clang-codegen

Author: Helena Kotas (hekota)


Changes

Adds support for accessing individual resources from fixed-size global resource 
arrays.

Design proposal: 
https://github.com/llvm/wg-hlsl/blob/main/proposals/0028-resource-arrays.md

Enables indexing into globally scoped, fixed-size resource arrays to retrieve 
individual resources. The initialization logic is primarily handled during 
codegen. When a global resource array is indexed, the codegen translates the 
`ArraySubscriptExpr` AST node into a constructor call for the corresponding 
resource record type and binding.

To support this behavior, Sema needs to ensure that:
- The constructor for the specific resource type is instantiated.
- An implicit binding attribute is added to resource arrays that lack explicit 
bindings (#152452).

Closes #145424

Depends on #152450 and #152452.

---

Patch is 28.60 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/152454.diff


9 Files Affected:

- (modified) clang/include/clang/Sema/SemaHLSL.h (+8-1) 
- (modified) clang/lib/CodeGen/CGExpr.cpp (+10) 
- (modified) clang/lib/CodeGen/CGHLSLRuntime.cpp (+211-12) 
- (modified) clang/lib/CodeGen/CGHLSLRuntime.h (+6) 
- (modified) clang/lib/CodeGen/CodeGenModule.cpp (+2-2) 
- (modified) clang/lib/Sema/SemaHLSL.cpp (+70-23) 
- (added) clang/test/CodeGenHLSL/resources/res-array-global-multi-dim.hlsl 
(+32) 
- (added) clang/test/CodeGenHLSL/resources/res-array-global.hlsl (+59) 
- (modified) clang/test/CodeGenHLSL/static-local-ctor.hlsl (+3-2) 


``diff
diff --git a/clang/include/clang/Sema/SemaHLSL.h 
b/clang/include/clang/Sema/SemaHLSL.h
index 085c9ed9f3ebd..0c215c6e10013 100644
--- a/clang/include/clang/Sema/SemaHLSL.h
+++ b/clang/include/clang/Sema/SemaHLSL.h
@@ -229,10 +229,17 @@ class SemaHLSL : public SemaBase {
 
   void diagnoseAvailabilityViolations(TranslationUnitDecl *TU);
 
-  bool initGlobalResourceDecl(VarDecl *VD);
   uint32_t getNextImplicitBindingOrderID() {
 return ImplicitBindingNextOrderID++;
   }
+
+  bool initGlobalResourceDecl(VarDecl *VD);
+  bool initGlobalResourceArrayDecl(VarDecl *VD);
+  void createResourceRecordCtorArgs(const Type *ResourceTy, StringRef VarName,
+HLSLResourceBindingAttr *RBA,
+HLSLVkBindingAttr *VkBinding,
+uint32_t ArrayIndex,
+llvm::SmallVector &Args);
 };
 
 } // namespace clang
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index ed35a055d8a7f..8c34fb501a3b8 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -16,6 +16,7 @@
 #include "CGCall.h"
 #include "CGCleanup.h"
 #include "CGDebugInfo.h"
+#include "CGHLSLRuntime.h"
 #include "CGObjCRuntime.h"
 #include "CGOpenMPRuntime.h"
 #include "CGRecordLayout.h"
@@ -4532,6 +4533,15 @@ LValue CodeGenFunction::EmitArraySubscriptExpr(const 
ArraySubscriptExpr *E,
  LHS.getBaseInfo(), TBAAAccessInfo());
   }
 
+  // The HLSL runtime handle the subscript expression on global resource 
arrays.
+  if (getLangOpts().HLSL && (E->getType()->isHLSLResourceRecord() ||
+ E->getType()->isHLSLResourceRecordArray())) {
+std::optional LV =
+CGM.getHLSLRuntime().emitResourceArraySubscriptExpr(E, *this);
+if (LV.has_value())
+  return *LV;
+  }
+
   // All the other cases basically behave like simple offsetting.
 
   // Handle the extvector case we ignored above.
diff --git a/clang/lib/CodeGen/CGHLSLRuntime.cpp 
b/clang/lib/CodeGen/CGHLSLRuntime.cpp
index 918cb3e38448d..a09e540367a18 100644
--- a/clang/lib/CodeGen/CGHLSLRuntime.cpp
+++ b/clang/lib/CodeGen/CGHLSLRuntime.cpp
@@ -84,6 +84,124 @@ void addRootSignature(llvm::dxbc::RootSignatureVersion 
RootSigVer,
   RootSignatureValMD->addOperand(MDVals);
 }
 
+// If the specified expr is a simple decay from an array to pointer,
+// return the array subexpression. Otherwise, return nullptr.
+static const Expr *getSubExprFromArrayDecayOperand(const Expr *E) {
+  const auto *CE = dyn_cast(E);
+  if (!CE || CE->getCastKind() != CK_ArrayToPointerDecay)
+return nullptr;
+  return CE->getSubExpr();
+}
+
+// Find array variable declaration from nested array subscript AST nodes
+static const ValueDecl *getArrayDecl(const ArraySubscriptExpr *ASE) {
+  const Expr *E = nullptr;
+  while (ASE != nullptr) {
+E = getSubExprFromArrayDecayOperand(ASE->getBase());
+if (!E)
+  return nullptr;
+ASE = dyn_cast(E);
+  }
+  if (const DeclRefExpr *DRE = dyn_cast_or_null(E))
+return DRE->getDecl();
+  return nullptr;
+}
+
+// Get the total size of the array, or -1 if the array is unbounded.
+static int getTotalArraySize(const clang::Type *Ty) {
+  assert(Ty->isArrayType() && "expected array type");
+  if (Ty->isIncompleteArrayType())
+return -1;
+  int Size = 1;
+  while (const

[llvm-branch-commits] [clang] [HLSL] Global resource arrays element access (PR #152454)

2025-08-07 Thread Helena Kotas via llvm-branch-commits

https://github.com/hekota ready_for_review 
https://github.com/llvm/llvm-project/pull/152454
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Enable ISD::PTRADD for 64-bit AS by default (PR #146076)

2025-08-07 Thread Changpeng Fang via llvm-branch-commits

changpeng wrote:

> Rebase and updated new test checks. @changpeng, could you please verify if 
> the AMDGPU/no-folding-imm-to-inst-with-fi.ll test that #151263 recently added 
> still does what it is supposed to do with the updated checks in this PR?

It is good (as long as it passes)

https://github.com/llvm/llvm-project/pull/146076
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/21.x: [DAG] visitFREEZE - limit freezing of multiple operands (PR #150425)

2025-08-07 Thread Simon Pilgrim via llvm-branch-commits

RKSimon wrote:

@tru @nikic Is there anything that I still need to do here?

https://github.com/llvm/llvm-project/pull/150425
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL] Add implicit binding attribute to resource arrays (PR #152452)

2025-08-07 Thread Helena Kotas via llvm-branch-commits

https://github.com/hekota ready_for_review 
https://github.com/llvm/llvm-project/pull/152452
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL] Add implicit binding attribute to resource arrays (PR #152452)

2025-08-07 Thread via llvm-branch-commits

llvmbot wrote:



@llvm/pr-subscribers-clang

@llvm/pr-subscribers-hlsl

Author: Helena Kotas (hekota)


Changes

If a resource array does not have an explicit binding attribute, SemaHLSL will 
add an implicit one. The attribute will be used to transfer implicit binding 
order ID to the codegen, the same way as it is done for HLSLBufferDecls. This 
is necessary in order to generate correct initialization of resources in an 
array that does not have an explicit binding.

Depends on #152450

Part 1 of #145424

---
Full diff: https://github.com/llvm/llvm-project/pull/152452.diff


2 Files Affected:

- (modified) clang/lib/Sema/SemaHLSL.cpp (+44-11) 
- (modified) clang/test/AST/HLSL/resource_binding_attr.hlsl (+20) 


``diff
diff --git a/clang/lib/Sema/SemaHLSL.cpp b/clang/lib/Sema/SemaHLSL.cpp
index 17f17f8114373..6811f3f27603b 100644
--- a/clang/lib/Sema/SemaHLSL.cpp
+++ b/clang/lib/Sema/SemaHLSL.cpp
@@ -71,6 +71,10 @@ static RegisterType getRegisterType(ResourceClass RC) {
   llvm_unreachable("unexpected ResourceClass value");
 }
 
+static RegisterType getRegisterType(const HLSLAttributedResourceType *ResTy) {
+  return getRegisterType(ResTy->getAttrs().ResourceClass);
+}
+
 // Converts the first letter of string Slot to RegisterType.
 // Returns false if the letter does not correspond to a valid register type.
 static bool convertToRegisterType(StringRef Slot, RegisterType *RT) {
@@ -342,6 +346,17 @@ static bool isResourceRecordTypeOrArrayOf(VarDecl *VD) {
   return Ty->isHLSLResourceRecord() || Ty->isHLSLResourceRecordArray();
 }
 
+static const HLSLAttributedResourceType *
+getResourceArrayHandleType(VarDecl *VD) {
+  assert(VD->getType()->isHLSLResourceRecordArray() &&
+ "expected array of resource records");
+  const Type *Ty = VD->getType()->getUnqualifiedDesugaredType();
+  while (const ConstantArrayType *CAT = dyn_cast(Ty)) {
+Ty = CAT->getArrayElementTypeNoTypeQual()->getUnqualifiedDesugaredType();
+  }
+  return HLSLAttributedResourceType::findHandleTypeOnResource(Ty);
+}
+
 // Returns true if the type is a leaf element type that is not valid to be
 // included in HLSL Buffer, such as a resource class, empty struct, zero-sized
 // array, or a builtin intangible type. Returns false it is a valid leaf 
element
@@ -568,16 +583,13 @@ void createHostLayoutStructForBuffer(Sema &S, 
HLSLBufferDecl *BufDecl) {
   BufDecl->addLayoutStruct(LS);
 }
 
-static void addImplicitBindingAttrToBuffer(Sema &S, HLSLBufferDecl *BufDecl,
-   uint32_t ImplicitBindingOrderID) {
-  RegisterType RT =
-  BufDecl->isCBuffer() ? RegisterType::CBuffer : RegisterType::SRV;
+static void addImplicitBindingAttrToDecl(Sema &S, Decl *D, RegisterType RT,
+ uint32_t ImplicitBindingOrderID) {
   auto *Attr =
   HLSLResourceBindingAttr::CreateImplicit(S.getASTContext(), "", "0", {});
-  std::optional RegSlot;
-  Attr->setBinding(RT, RegSlot, 0);
+  Attr->setBinding(RT, std::nullopt, 0);
   Attr->setImplicitBindingOrderID(ImplicitBindingOrderID);
-  BufDecl->addAttr(Attr);
+  D->addAttr(Attr);
 }
 
 // Handle end of cbuffer/tbuffer declaration
@@ -600,7 +612,10 @@ void SemaHLSL::ActOnFinishBuffer(Decl *Dcl, SourceLocation 
RBrace) {
 if (RBA)
   RBA->setImplicitBindingOrderID(OrderID);
 else
-  addImplicitBindingAttrToBuffer(SemaRef, BufDecl, OrderID);
+  addImplicitBindingAttrToDecl(SemaRef, BufDecl,
+   BufDecl->isCBuffer() ? RegisterType::CBuffer
+: RegisterType::SRV,
+   OrderID);
   }
 
   SemaRef.PopDeclContext();
@@ -1906,7 +1921,7 @@ static bool DiagnoseLocalRegisterBinding(Sema &S, 
SourceLocation &ArgLoc,
   if (const HLSLAttributedResourceType *AttrResType =
   HLSLAttributedResourceType::findHandleTypeOnResource(
   VD->getType().getTypePtr())) {
-if (RegType == getRegisterType(AttrResType->getAttrs().ResourceClass))
+if (RegType == getRegisterType(AttrResType))
   return true;
 
 S.Diag(D->getLocation(), diag::err_hlsl_binding_type_mismatch)
@@ -2439,8 +2454,8 @@ void 
SemaHLSL::ActOnEndOfTranslationUnit(TranslationUnitDecl *TU) {
 HLSLBufferDecl *DefaultCBuffer = HLSLBufferDecl::CreateDefaultCBuffer(
 SemaRef.getASTContext(), SemaRef.getCurLexicalContext(),
 DefaultCBufferDecls);
-addImplicitBindingAttrToBuffer(SemaRef, DefaultCBuffer,
-   getNextImplicitBindingOrderID());
+addImplicitBindingAttrToDecl(SemaRef, DefaultCBuffer, 
RegisterType::CBuffer,
+ getNextImplicitBindingOrderID());
 SemaRef.getCurLexicalContext()->addDecl(DefaultCBuffer);
 createHostLayoutStructForBuffer(SemaRef, DefaultCBuffer);
 
@@ -3640,6 +3655,24 @@ void SemaHLSL::ActOnVariableDeclarator(VarDecl *VD) {
 
 // process explicit bindings
 processExplicitBindingsOnDec

[llvm-branch-commits] [llvm] [AArch64][ISel] Extend vector_splice tests (NFC) (PR #152553)

2025-08-07 Thread Gaëtan Bossu via llvm-branch-commits

https://github.com/gbossu created 
https://github.com/llvm/llvm-project/pull/152553

They use extract shuffles for fixed vectors, and
llvm.vector.splice intrinsics for scalable vectors.

In the previous tests using ld+extract+st, the extract was optimized away and 
replaced by a smaller load at the right offset. This meant we didin't really 
test the vector_splice ISD node.

**This is a chained PR**

From a6be08b2dd026b6b3dcd7ca8ed5e231671a160b3 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ga=C3=ABtan=20Bossu?= 
Date: Wed, 6 Aug 2025 10:32:44 +
Subject: [PATCH] [AArch64][ISel] Extend vector_splice tests (NFC)

They use extract shuffles for fixed vectors, and
llvm.vector.splice intrinsics for scalable vectors.

In the previous tests using ld+extract+st, the extract was optimized
away and replaced by a smaller load at the right offset. This meant
we didin't really test the vector_splice ISD node.
---
 .../sve-fixed-length-extract-subvector.ll | 368 +-
 .../test/CodeGen/AArch64/sve-vector-splice.ll | 162 
 2 files changed, 526 insertions(+), 4 deletions(-)
 create mode 100644 llvm/test/CodeGen/AArch64/sve-vector-splice.ll

diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-extract-subvector.ll 
b/llvm/test/CodeGen/AArch64/sve-fixed-length-extract-subvector.ll
index 2dd3269a2..800f95d97af4c 100644
--- a/llvm/test/CodeGen/AArch64/sve-fixed-length-extract-subvector.ll
+++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-extract-subvector.ll
@@ -5,6 +5,12 @@
 
 target triple = "aarch64-unknown-linux-gnu"
 
+; Note that both the vector.extract intrinsics and SK_ExtractSubvector
+; shufflevector instructions get detected as a extract_subvector ISD node in
+; SelectionDAG. We'll test both cases for the sake of completeness, even though
+; vector.extract intrinsics should get lowered into shufflevector by the time 
we
+; reach the backend.
+
 ; i8
 
 ; Don't use SVE for 64-bit vectors.
@@ -40,6 +46,67 @@ define void @extract_subvector_v32i8(ptr %a, ptr %b) 
vscale_range(2,0) #0 {
   ret void
 }
 
+define void @extract_v32i8_halves(ptr %in, ptr %out, ptr %out2) #0 
vscale_range(2,2) {
+; CHECK-LABEL: extract_v32i8_halves:
+; CHECK:   // %bb.0: // %entry
+; CHECK-NEXT:ldr z0, [x0]
+; CHECK-NEXT:mov z1.d, z0.d
+; CHECK-NEXT:ext z1.b, z1.b, z0.b, #16
+; CHECK-NEXT:str q1, [x1]
+; CHECK-NEXT:str q0, [x2]
+; CHECK-NEXT:ret
+entry:
+  %b = load <32 x i8>, ptr %in
+  %hi = shufflevector <32 x i8> %b, <32 x i8> poison, <16 x i32> 
+  store <16 x i8> %hi, ptr %out
+  %lo = shufflevector <32 x i8> %b, <32 x i8> poison, <16 x i32> 
+  store <16 x i8> %lo, ptr %out2
+  ret void
+}
+
+define void @extract_v32i8_half_unaligned(ptr %in, ptr %out) #0 
vscale_range(2,2) {
+; CHECK-LABEL: extract_v32i8_half_unaligned:
+; CHECK:   // %bb.0: // %entry
+; CHECK-NEXT:ldr z0, [x0]
+; CHECK-NEXT:mov z1.d, z0.d
+; CHECK-NEXT:ext z1.b, z1.b, z0.b, #16
+; CHECK-NEXT:ext v0.16b, v0.16b, v1.16b, #4
+; CHECK-NEXT:str q0, [x1]
+; CHECK-NEXT:ret
+entry:
+  %b = load <32 x i8>, ptr %in
+  %d = shufflevector <32 x i8> %b, <32 x i8> poison, <16 x i32> 
+  store <16 x i8> %d, ptr %out
+  ret void
+}
+
+define void @extract_v32i8_quarters(ptr %in, ptr %out, ptr %out2, ptr %out3, 
ptr %out4) #0 vscale_range(2,2) {
+; CHECK-LABEL: extract_v32i8_quarters:
+; CHECK:   // %bb.0: // %entry
+; CHECK-NEXT:ldr z0, [x0]
+; CHECK-NEXT:mov z1.d, z0.d
+; CHECK-NEXT:mov z2.d, z0.d
+; CHECK-NEXT:ext z1.b, z1.b, z0.b, #16
+; CHECK-NEXT:ext z2.b, z2.b, z0.b, #24
+; CHECK-NEXT:str d1, [x1]
+; CHECK-NEXT:str d2, [x2]
+; CHECK-NEXT:str d0, [x3]
+; CHECK-NEXT:ext z0.b, z0.b, z0.b, #8
+; CHECK-NEXT:str d0, [x4]
+; CHECK-NEXT:ret
+entry:
+  %b = load <32 x i8>, ptr %in
+  %hilo = shufflevector <32 x i8> %b, <32 x i8> poison, <8 x i32> 
+  store <8 x i8> %hilo, ptr %out
+  %hihi = shufflevector <32 x i8> %b, <32 x i8> poison, <8 x i32> 
+  store <8 x i8> %hihi, ptr %out2
+  %lolo = shufflevector <32 x i8> %b, <32 x i8> poison, <8 x i32> 
+  store <8 x i8> %lolo, ptr %out3
+  %lohi = shufflevector <32 x i8> %b, <32 x i8> poison, <8 x i32> 
+  store <8 x i8> %lohi, ptr %out4
+  ret void
+}
+
 define void @extract_subvector_v64i8(ptr %a, ptr %b) #0 {
 ; CHECK-LABEL: extract_subvector_v64i8:
 ; CHECK:   // %bb.0:
@@ -54,6 +121,25 @@ define void @extract_subvector_v64i8(ptr %a, ptr %b) #0 {
   ret void
 }
 
+define void @extract_v64i8_halves(ptr %in, ptr %out, ptr %out2) #0 
vscale_range(4,4) {
+; CHECK-LABEL: extract_v64i8_halves:
+; CHECK:   // %bb.0: // %entry
+; CHECK-NEXT:ldr z0, [x0]
+; CHECK-NEXT:ptrue p0.b, vl32
+; CHECK-NEXT:mov z1.d, z0.d
+; CHECK-NEXT:ext z1.b, z1.b, z0.b, #32
+; CHECK-NEXT:st1b { z1.b }, p0, [x1]
+; CHECK-NEXT:st1b { z0.b }, p0, [x2]
+; CHECK-NEXT:ret
+entry:
+  %b = load <64 x i8>, ptr %in
+  %hi = shufflevector <64 x i8> %b, <64 x i8> poison, <32 x i32> 
+  store <32 x i8> %hi, ptr

[llvm-branch-commits] [llvm] [AArch64][ISel] Extend vector_splice tests (NFC) (PR #152553)

2025-08-07 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-aarch64

Author: Gaëtan Bossu (gbossu)


Changes

They use extract shuffles for fixed vectors, and
llvm.vector.splice intrinsics for scalable vectors.

In the previous tests using ld+extract+st, the extract was optimized away and 
replaced by a smaller load at the right offset. This meant we didin't really 
test the vector_splice ISD node.

**This is a chained PR**

---

Patch is 27.84 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/152553.diff


2 Files Affected:

- (modified) llvm/test/CodeGen/AArch64/sve-fixed-length-extract-subvector.ll 
(+364-4) 
- (added) llvm/test/CodeGen/AArch64/sve-vector-splice.ll (+162) 


``diff
diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-extract-subvector.ll 
b/llvm/test/CodeGen/AArch64/sve-fixed-length-extract-subvector.ll
index 2dd3269a2..800f95d97af4c 100644
--- a/llvm/test/CodeGen/AArch64/sve-fixed-length-extract-subvector.ll
+++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-extract-subvector.ll
@@ -5,6 +5,12 @@
 
 target triple = "aarch64-unknown-linux-gnu"
 
+; Note that both the vector.extract intrinsics and SK_ExtractSubvector
+; shufflevector instructions get detected as a extract_subvector ISD node in
+; SelectionDAG. We'll test both cases for the sake of completeness, even though
+; vector.extract intrinsics should get lowered into shufflevector by the time 
we
+; reach the backend.
+
 ; i8
 
 ; Don't use SVE for 64-bit vectors.
@@ -40,6 +46,67 @@ define void @extract_subvector_v32i8(ptr %a, ptr %b) 
vscale_range(2,0) #0 {
   ret void
 }
 
+define void @extract_v32i8_halves(ptr %in, ptr %out, ptr %out2) #0 
vscale_range(2,2) {
+; CHECK-LABEL: extract_v32i8_halves:
+; CHECK:   // %bb.0: // %entry
+; CHECK-NEXT:ldr z0, [x0]
+; CHECK-NEXT:mov z1.d, z0.d
+; CHECK-NEXT:ext z1.b, z1.b, z0.b, #16
+; CHECK-NEXT:str q1, [x1]
+; CHECK-NEXT:str q0, [x2]
+; CHECK-NEXT:ret
+entry:
+  %b = load <32 x i8>, ptr %in
+  %hi = shufflevector <32 x i8> %b, <32 x i8> poison, <16 x i32> 
+  store <16 x i8> %hi, ptr %out
+  %lo = shufflevector <32 x i8> %b, <32 x i8> poison, <16 x i32> 
+  store <16 x i8> %lo, ptr %out2
+  ret void
+}
+
+define void @extract_v32i8_half_unaligned(ptr %in, ptr %out) #0 
vscale_range(2,2) {
+; CHECK-LABEL: extract_v32i8_half_unaligned:
+; CHECK:   // %bb.0: // %entry
+; CHECK-NEXT:ldr z0, [x0]
+; CHECK-NEXT:mov z1.d, z0.d
+; CHECK-NEXT:ext z1.b, z1.b, z0.b, #16
+; CHECK-NEXT:ext v0.16b, v0.16b, v1.16b, #4
+; CHECK-NEXT:str q0, [x1]
+; CHECK-NEXT:ret
+entry:
+  %b = load <32 x i8>, ptr %in
+  %d = shufflevector <32 x i8> %b, <32 x i8> poison, <16 x i32> 
+  store <16 x i8> %d, ptr %out
+  ret void
+}
+
+define void @extract_v32i8_quarters(ptr %in, ptr %out, ptr %out2, ptr %out3, 
ptr %out4) #0 vscale_range(2,2) {
+; CHECK-LABEL: extract_v32i8_quarters:
+; CHECK:   // %bb.0: // %entry
+; CHECK-NEXT:ldr z0, [x0]
+; CHECK-NEXT:mov z1.d, z0.d
+; CHECK-NEXT:mov z2.d, z0.d
+; CHECK-NEXT:ext z1.b, z1.b, z0.b, #16
+; CHECK-NEXT:ext z2.b, z2.b, z0.b, #24
+; CHECK-NEXT:str d1, [x1]
+; CHECK-NEXT:str d2, [x2]
+; CHECK-NEXT:str d0, [x3]
+; CHECK-NEXT:ext z0.b, z0.b, z0.b, #8
+; CHECK-NEXT:str d0, [x4]
+; CHECK-NEXT:ret
+entry:
+  %b = load <32 x i8>, ptr %in
+  %hilo = shufflevector <32 x i8> %b, <32 x i8> poison, <8 x i32> 
+  store <8 x i8> %hilo, ptr %out
+  %hihi = shufflevector <32 x i8> %b, <32 x i8> poison, <8 x i32> 
+  store <8 x i8> %hihi, ptr %out2
+  %lolo = shufflevector <32 x i8> %b, <32 x i8> poison, <8 x i32> 
+  store <8 x i8> %lolo, ptr %out3
+  %lohi = shufflevector <32 x i8> %b, <32 x i8> poison, <8 x i32> 
+  store <8 x i8> %lohi, ptr %out4
+  ret void
+}
+
 define void @extract_subvector_v64i8(ptr %a, ptr %b) #0 {
 ; CHECK-LABEL: extract_subvector_v64i8:
 ; CHECK:   // %bb.0:
@@ -54,6 +121,25 @@ define void @extract_subvector_v64i8(ptr %a, ptr %b) #0 {
   ret void
 }
 
+define void @extract_v64i8_halves(ptr %in, ptr %out, ptr %out2) #0 
vscale_range(4,4) {
+; CHECK-LABEL: extract_v64i8_halves:
+; CHECK:   // %bb.0: // %entry
+; CHECK-NEXT:ldr z0, [x0]
+; CHECK-NEXT:ptrue p0.b, vl32
+; CHECK-NEXT:mov z1.d, z0.d
+; CHECK-NEXT:ext z1.b, z1.b, z0.b, #32
+; CHECK-NEXT:st1b { z1.b }, p0, [x1]
+; CHECK-NEXT:st1b { z0.b }, p0, [x2]
+; CHECK-NEXT:ret
+entry:
+  %b = load <64 x i8>, ptr %in
+  %hi = shufflevector <64 x i8> %b, <64 x i8> poison, <32 x i32> 
+  store <32 x i8> %hi, ptr %out
+  %lo = shufflevector <64 x i8> %b, <64 x i8> poison, <32 x i32> 
+  store <32 x i8> %lo, ptr %out2
+  ret void
+}
+
 define void @extract_subvector_v128i8(ptr %a, ptr %b) vscale_range(8,0) #0 {
 ; CHECK-LABEL: extract_subvector_v128i8:
 ; CHECK:   // %bb.0:
@@ -117,6 +203,24 @@ define void @extract_subvector_v16i16(ptr %a, ptr %b) 
vscale_range(2,0) #0 {
   ret void
 }
 
+define void @extract_v16i16_halves(ptr %in, ptr

[llvm-branch-commits] [llvm] [AArch64][ISel] Select constructive EXT_ZZZI pseudo instruction (PR #152554)

2025-08-07 Thread via llvm-branch-commits
=?utf-8?q?Gaëtan?= Bossu 
Message-ID:
In-Reply-To: 


llvmbot wrote:




@llvm/pr-subscribers-backend-aarch64

Author: Gaëtan Bossu (gbossu)


Changes

The patch changes existing patterns to select the EXT_ZZZI pseudo
instead of the EXT_ZZI destructive instruction for vector_splice.

Given that registers aren't tied anymore, this gives the register
allocator more freedom and a lot of MOVs get replaced with MOVPRFX.

In some cases however, we could have just chosen the same input and
output register, but regalloc preferred not to. This means we end up
with some test cases now having more instructions: there is now a
MOVPRFX while no MOV was previously needed.

---

Patch is 154.60 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/152554.diff


21 Files Affected:

- (modified) llvm/lib/Target/AArch64/AArch64PostCoalescerPass.cpp (+7-3) 
- (modified) llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td (+4-4) 
- (modified) llvm/test/CodeGen/AArch64/sve-fixed-length-extract-subvector.ll 
(+21-20) 
- (modified) llvm/test/CodeGen/AArch64/sve-fixed-length-fp-to-int.ll (+24-20) 
- (modified) llvm/test/CodeGen/AArch64/sve-fixed-length-int-extends.ll (+30-24) 
- (modified) llvm/test/CodeGen/AArch64/sve-fixed-length-int-rem.ll (+20-20) 
- (modified) llvm/test/CodeGen/AArch64/sve-fixed-length-int-to-fp.ll (+24-20) 
- (modified) llvm/test/CodeGen/AArch64/sve-fixed-length-limit-duplane.ll (+8-6) 
- (modified) llvm/test/CodeGen/AArch64/sve-fixed-length-masked-loads.ll 
(+70-56) 
- (modified) llvm/test/CodeGen/AArch64/sve-fixed-length-partial-reduce.ll 
(+14-14) 
- (modified) llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll (+21-20) 
- (modified) llvm/test/CodeGen/AArch64/sve-pr92779.ll (+9-9) 
- (modified) 
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-fp-extend-trunc.ll 
(+15-12) 
- (modified) 
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-fp-to-int.ll 
(+150-136) 
- (modified) 
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-extends.ll 
(+413-327) 
- (modified) 
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-rem.ll (+108-108) 
- (modified) 
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-to-fp.ll 
(+152-132) 
- (modified) 
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-limit-duplane.ll 
(+8-7) 
- (modified) 
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-masked-load.ll 
(+14-12) 
- (modified) 
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-masked-store.ll 
(+20-18) 
- (modified) 
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-reductions.ll 
(+52-42) 


``diff
diff --git a/llvm/lib/Target/AArch64/AArch64PostCoalescerPass.cpp 
b/llvm/lib/Target/AArch64/AArch64PostCoalescerPass.cpp
index cdf2822f3ed9d..b7d69b68af4ee 100644
--- a/llvm/lib/Target/AArch64/AArch64PostCoalescerPass.cpp
+++ b/llvm/lib/Target/AArch64/AArch64PostCoalescerPass.cpp
@@ -53,9 +53,6 @@ bool 
AArch64PostCoalescer::runOnMachineFunction(MachineFunction &MF) {
   if (skipFunction(MF.getFunction()))
 return false;
 
-  AArch64FunctionInfo *FuncInfo = MF.getInfo();
-  if (!FuncInfo->hasStreamingModeChanges())
-return false;
 
   MRI = &MF.getRegInfo();
   LIS = &getAnalysis().getLIS();
@@ -86,6 +83,13 @@ bool 
AArch64PostCoalescer::runOnMachineFunction(MachineFunction &MF) {
 Changed = true;
 break;
   }
+  case AArch64::EXT_ZZZI:
+Register DstReg = MI.getOperand(0).getReg();
+Register SrcReg1 = MI.getOperand(1).getReg();
+if (SrcReg1 != DstReg) {
+  MRI->setRegAllocationHint(DstReg, 0, SrcReg1);
+}
+break;
   }
 }
   }
diff --git a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td 
b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
index 85e647af6684c..a3ca0cb73cd43 100644
--- a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -2135,19 +2135,19 @@ let Predicates = [HasSVE_or_SME] in {
   // Splice with lane bigger or equal to 0
   foreach VT = [nxv16i8] in
 def : Pat<(VT (vector_splice VT:$Z1, VT:$Z2, (i64 (sve_ext_imm_0_255 
i32:$index,
-  (EXT_ZZI  ZPR:$Z1, ZPR:$Z2, imm0_255:$index)>;
+  (EXT_ZZZI  ZPR:$Z1, ZPR:$Z2, imm0_255:$index)>;
 
   foreach VT = [nxv8i16, nxv8f16, nxv8bf16] in
 def : Pat<(VT (vector_splice VT:$Z1, VT:$Z2, (i64 (sve_ext_imm_0_127 
i32:$index,
-  (EXT_ZZI  ZPR:$Z1, ZPR:$Z2, imm0_255:$index)>;
+  (EXT_ZZZI  ZPR:$Z1, ZPR:$Z2, imm0_255:$index)>;
 
   foreach VT = [nxv4i32, nxv4f16, nxv4f32, nxv4bf16] in
 def : Pat<(VT (vector_splice VT:$Z1, VT:$Z2, (i64 (sve_ext_imm_0_63 
i32:$index,
-  (EXT_ZZI  ZPR:$Z1, ZPR:$Z2, imm0_255:$index)>;
+  (EXT_ZZZI  ZPR:$Z1, ZPR:$Z2, imm0_255:$index)>;
 
   foreach VT = [nxv2i64, nxv2f16, nxv2f32, nxv2f64, nxv2bf16] in
 def : Pat<(VT (vector_splice VT:$Z1, VT:$Z2, (i64 (sve_ext_imm_0_31 
i32:$index,
-  

[llvm-branch-commits] [flang] [llvm] [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (PR #151989)

2025-08-07 Thread Akash Banerjee via llvm-branch-commits

https://github.com/TIFitis updated 
https://github.com/llvm/llvm-project/pull/151989

>From e9b6766c5fbfd25b5acfc686cbdc41f8dd727b03 Mon Sep 17 00:00:00 2001
From: Akash Banerjee 
Date: Thu, 31 Jul 2025 19:48:15 +0100
Subject: [PATCH 1/2] [MLIR][OpenMP] Add a new AutomapToTargetData conversion
 pass in FIR

Add a new AutomapToTargetData pass. This gathers the declare target enter 
variables which have the AUTOMAP modifier.
And adds omp.declare_target_enter/exit mapping directives for fir.alloca and 
fir.free oeprations on the AUTOMAP enabled variables.
---
 .../include/flang/Optimizer/OpenMP/Passes.td  |  11 ++
 .../Optimizer/OpenMP/AutomapToTargetData.cpp  | 171 ++
 flang/lib/Optimizer/OpenMP/CMakeLists.txt |   1 +
 flang/lib/Optimizer/Passes/Pipelines.cpp  |  12 +-
 .../Transforms/omp-automap-to-target-data.fir |  40 
 .../fortran/declare-target-automap.f90|  36 
 6 files changed, 265 insertions(+), 6 deletions(-)
 create mode 100644 flang/lib/Optimizer/OpenMP/AutomapToTargetData.cpp
 create mode 100644 flang/test/Transforms/omp-automap-to-target-data.fir
 create mode 100644 offload/test/offloading/fortran/declare-target-automap.f90

diff --git a/flang/include/flang/Optimizer/OpenMP/Passes.td 
b/flang/include/flang/Optimizer/OpenMP/Passes.td
index 704faf0ccd856..0bff58f0f6394 100644
--- a/flang/include/flang/Optimizer/OpenMP/Passes.td
+++ b/flang/include/flang/Optimizer/OpenMP/Passes.td
@@ -112,4 +112,15 @@ def GenericLoopConversionPass
   ];
 }
 
+def AutomapToTargetDataPass
+: Pass<"omp-automap-to-target-data", "::mlir::ModuleOp"> {
+  let summary = "Insert OpenMP target data operations for AUTOMAP variables";
+  let description = [{
+Inserts `omp.target_enter_data` and `omp.target_exit_data` operations to
+map variables marked with the `AUTOMAP` modifier when their allocation
+or deallocation is detected in the FIR.
+  }];
+  let dependentDialects = ["mlir::omp::OpenMPDialect"];
+}
+
 #endif //FORTRAN_OPTIMIZER_OPENMP_PASSES
diff --git a/flang/lib/Optimizer/OpenMP/AutomapToTargetData.cpp 
b/flang/lib/Optimizer/OpenMP/AutomapToTargetData.cpp
new file mode 100644
index 0..c4937f1e90ee3
--- /dev/null
+++ b/flang/lib/Optimizer/OpenMP/AutomapToTargetData.cpp
@@ -0,0 +1,171 @@
+//===- AutomapToTargetData.cpp ---===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "flang/Optimizer/Builder/DirectivesCommon.h"
+#include "flang/Optimizer/Builder/FIRBuilder.h"
+#include "flang/Optimizer/Builder/HLFIRTools.h"
+#include "flang/Optimizer/Dialect/FIROps.h"
+#include "flang/Optimizer/Dialect/FIRType.h"
+#include "flang/Optimizer/Dialect/Support/KindMapping.h"
+#include "flang/Optimizer/HLFIR/HLFIROps.h"
+#include "mlir/IR/BuiltinAttributes.h"
+#include "mlir/Pass/Pass.h"
+#include "llvm/Frontend/OpenMP/OMPConstants.h"
+#include 
+#include 
+
+namespace flangomp {
+#define GEN_PASS_DEF_AUTOMAPTOTARGETDATAPASS
+#include "flang/Optimizer/OpenMP/Passes.h.inc"
+} // namespace flangomp
+
+using namespace mlir;
+
+namespace {
+class AutomapToTargetDataPass
+: public flangomp::impl::AutomapToTargetDataPassBase<
+  AutomapToTargetDataPass> {
+  // Returns true if the variable has a dynamic size and therefore requires
+  // bounds operations to describe its extents.
+  bool needsBoundsOps(Value var) {
+assert(isa(var.getType()) &&
+   "only pointer like types expected");
+Type t = fir::unwrapRefType(var.getType());
+if (Type inner = fir::dyn_cast_ptrOrBoxEleTy(t))
+  return fir::hasDynamicSize(inner);
+return fir::hasDynamicSize(t);
+  }
+
+  // Generate MapBoundsOp operations for the variable if required.
+  void genBoundsOps(fir::FirOpBuilder &builder, Value var,
+SmallVectorImpl &boundsOps) {
+Location loc = var.getLoc();
+fir::factory::AddrAndBoundsInfo info =
+fir::factory::getDataOperandBaseAddr(builder, var,
+ /*isOptional=*/false, loc);
+fir::ExtendedValue exv =
+hlfir::translateToExtendedValue(loc, builder, hlfir::Entity{info.addr},
+/*contiguousHint=*/true)
+.first;
+SmallVector tmp =
+fir::factory::genImplicitBoundsOps(
+builder, info, exv, /*dataExvIsAssumedSize=*/false, loc);
+llvm::append_range(boundsOps, tmp);
+  }
+
+  void findRelatedAllocmemFreemem(fir::AddrOfOp addressOfOp,
+  llvm::SmallVector &allocmems,
+  llvm::SmallVector &freemems) {
+assert(addressOfOp->hasOneUse() && "op must have single use");
+
+auto declaredRef =
+cast(*addressOfOp->getUsers().begin

[llvm-branch-commits] [llvm] [AArch64][ISel] Select constructive EXT_ZZZI pseudo instruction (PR #152554)

2025-08-07 Thread via llvm-branch-commits
=?utf-8?q?Gaëtan?= Bossu 
Message-ID:
In-Reply-To: 


github-actions[bot] wrote:




:warning: C/C++ code formatter, clang-format found issues in your code. 
:warning:



You can test this locally with the following command:


``bash
git-clang-format --diff HEAD~1 HEAD --extensions cpp -- 
llvm/lib/Target/AArch64/AArch64PostCoalescerPass.cpp
``





View the diff from clang-format here.


``diff
diff --git a/llvm/lib/Target/AArch64/AArch64PostCoalescerPass.cpp 
b/llvm/lib/Target/AArch64/AArch64PostCoalescerPass.cpp
index b7d69b68a..9d3e9105f 100644
--- a/llvm/lib/Target/AArch64/AArch64PostCoalescerPass.cpp
+++ b/llvm/lib/Target/AArch64/AArch64PostCoalescerPass.cpp
@@ -53,7 +53,6 @@ bool 
AArch64PostCoalescer::runOnMachineFunction(MachineFunction &MF) {
   if (skipFunction(MF.getFunction()))
 return false;
 
-
   MRI = &MF.getRegInfo();
   LIS = &getAnalysis().getLIS();
   bool Changed = false;

``




https://github.com/llvm/llvm-project/pull/152554
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/21.x: [Flang] Search flang_rt in clang_rt path (#151954) (PR #152458)

2025-08-07 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-clang

Author: None (llvmbot)


Changes

Backport 8de481913353a1e37264687d5cc73db0de19e6cc

Requested by: @Meinersbur

---
Full diff: https://github.com/llvm/llvm-project/pull/152458.diff


1 Files Affected:

- (modified) clang/lib/Driver/ToolChain.cpp (+21-8) 


``diff
diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp
index 3f9b808b2722e..07a3ae925f96d 100644
--- a/clang/lib/Driver/ToolChain.cpp
+++ b/clang/lib/Driver/ToolChain.cpp
@@ -837,17 +837,30 @@ void ToolChain::addFortranRuntimeLibs(const ArgList &Args,
 
 void ToolChain::addFortranRuntimeLibraryPath(const llvm::opt::ArgList &Args,
  ArgStringList &CmdArgs) const {
-  // Default to the /../lib directory. This works fine on the
-  // platforms that we have tested so far. We will probably have to re-fine
-  // this in the future. In particular, on some platforms, we may need to use
-  // lib64 instead of lib.
+  auto AddLibSearchPathIfExists = [&](const Twine &Path) {
+// Linker may emit warnings about non-existing directories
+if (!llvm::sys::fs::is_directory(Path))
+  return;
+
+if (getTriple().isKnownWindowsMSVCEnvironment())
+  CmdArgs.push_back(Args.MakeArgString("-libpath:" + Path));
+else
+  CmdArgs.push_back(Args.MakeArgString("-L" + Path));
+  };
+
+  // Search for flang_rt.* at the same location as clang_rt.* with
+  // LLVM_ENABLE_PER_TARGET_RUNTIME_DIR=0. On most platforms, flang_rt is
+  // located at the path returned by getRuntimePath() which is already added to
+  // the library search path. This exception is for Apple-Darwin.
+  AddLibSearchPathIfExists(getCompilerRTPath());
+
+  // Fall back to the non-resource directory /../lib. We will
+  // probably have to refine this in the future. In particular, on some
+  // platforms, we may need to use lib64 instead of lib.
   SmallString<256> DefaultLibPath =
   llvm::sys::path::parent_path(getDriver().Dir);
   llvm::sys::path::append(DefaultLibPath, "lib");
-  if (getTriple().isKnownWindowsMSVCEnvironment())
-CmdArgs.push_back(Args.MakeArgString("-libpath:" + DefaultLibPath));
-  else
-CmdArgs.push_back(Args.MakeArgString("-L" + DefaultLibPath));
+  AddLibSearchPathIfExists(DefaultLibPath);
 }
 
 void ToolChain::addFlangRTLibPath(const ArgList &Args,

``




https://github.com/llvm/llvm-project/pull/152458
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [llvm][cmake] Turn runtime in PROJECTS warnings into FATAL_ERROR (PR #152302)

2025-08-07 Thread David Spickett via llvm-branch-commits

https://github.com/DavidSpickett demilestoned 
https://github.com/llvm/llvm-project/pull/152302
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/21.x: [Flang] Search flang_rt in clang_rt path (#151954) (PR #152458)

2025-08-07 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/152458

Backport 8de481913353a1e37264687d5cc73db0de19e6cc

Requested by: @Meinersbur

>From 8a59c3705a92e904f9cdcbfe73342d6197659db0 Mon Sep 17 00:00:00 2001
From: Michael Kruse 
Date: Wed, 6 Aug 2025 16:58:08 +0200
Subject: [PATCH] [Flang] Search flang_rt in clang_rt path (#151954)

The clang/flang driver has two separate systems for find the location of
clang_rt (simplified):

* `getCompilerRTPath()`, e.g. `../lib/clang/22/lib/windows`,
   used when `LLVM_ENABLE_PER_TARGET_RUNTIME_DIR=0`
* `getRuntimePath()`, e.g. `../lib/clang/22/lib/x86_64-pc-windows-msvc`,
   used when `LLVM_ENABLE_PER_TARGET_RUNTIME_DIR=1`

To simplify the search path, Flang-RT normally assumes only
`getRuntimePath()`, i.e. ignoring `LLVM_ENABLE_PER_TARGET_RUNTIME_DIR`
and always using the `LLVM_ENABLE_PER_TARGET_RUNTIME_DIR=1` mechanism.
There is an exception for Apple Darwin triples where `getRuntimePath()`
returns nothing. The flang-rt/compiler-rt CMake code for library
location also ignores `LLVM_ENABLE_PER_TARGET_RUNTIME_DIR` but uses the
`LLVM_ENABLE_PER_TARGET_RUNTIME_DIR=0` path instead. Since only
`getRuntimePath()` is automatically added to the linker command line,
this patch explicitly adds `getCompilerRTPath()` to the path when
linking flang_rt.

Fixes #151031

(cherry picked from commit 8de481913353a1e37264687d5cc73db0de19e6cc)
---
 clang/lib/Driver/ToolChain.cpp | 29 +
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp
index 3f9b808b2722e..07a3ae925f96d 100644
--- a/clang/lib/Driver/ToolChain.cpp
+++ b/clang/lib/Driver/ToolChain.cpp
@@ -837,17 +837,30 @@ void ToolChain::addFortranRuntimeLibs(const ArgList &Args,
 
 void ToolChain::addFortranRuntimeLibraryPath(const llvm::opt::ArgList &Args,
  ArgStringList &CmdArgs) const {
-  // Default to the /../lib directory. This works fine on the
-  // platforms that we have tested so far. We will probably have to re-fine
-  // this in the future. In particular, on some platforms, we may need to use
-  // lib64 instead of lib.
+  auto AddLibSearchPathIfExists = [&](const Twine &Path) {
+// Linker may emit warnings about non-existing directories
+if (!llvm::sys::fs::is_directory(Path))
+  return;
+
+if (getTriple().isKnownWindowsMSVCEnvironment())
+  CmdArgs.push_back(Args.MakeArgString("-libpath:" + Path));
+else
+  CmdArgs.push_back(Args.MakeArgString("-L" + Path));
+  };
+
+  // Search for flang_rt.* at the same location as clang_rt.* with
+  // LLVM_ENABLE_PER_TARGET_RUNTIME_DIR=0. On most platforms, flang_rt is
+  // located at the path returned by getRuntimePath() which is already added to
+  // the library search path. This exception is for Apple-Darwin.
+  AddLibSearchPathIfExists(getCompilerRTPath());
+
+  // Fall back to the non-resource directory /../lib. We will
+  // probably have to refine this in the future. In particular, on some
+  // platforms, we may need to use lib64 instead of lib.
   SmallString<256> DefaultLibPath =
   llvm::sys::path::parent_path(getDriver().Dir);
   llvm::sys::path::append(DefaultLibPath, "lib");
-  if (getTriple().isKnownWindowsMSVCEnvironment())
-CmdArgs.push_back(Args.MakeArgString("-libpath:" + DefaultLibPath));
-  else
-CmdArgs.push_back(Args.MakeArgString("-L" + DefaultLibPath));
+  AddLibSearchPathIfExists(DefaultLibPath);
 }
 
 void ToolChain::addFlangRTLibPath(const ArgList &Args,

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/21.x: [Flang] Search flang_rt in clang_rt path (#151954) (PR #152458)

2025-08-07 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/152458
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/21.x: [Flang] Search flang_rt in clang_rt path (#151954) (PR #152458)

2025-08-07 Thread via llvm-branch-commits

llvmbot wrote:

@carlocab What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/152458
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/21.x: [Flang] Search flang_rt in clang_rt path (#151954) (PR #152458)

2025-08-07 Thread Carlo Cabrera via llvm-branch-commits

https://github.com/carlocab approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/152458
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/21.x: [Flang] Search flang_rt in clang_rt path (#151954) (PR #152458)

2025-08-07 Thread Carlo Cabrera via llvm-branch-commits

carlocab wrote:

Probably needs merged by a release manager?

https://github.com/llvm/llvm-project/pull/152458
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [mlir] [OpenMP][OMPIRBuilder] Use device shared memory for arg structures (PR #150925)

2025-08-07 Thread Michael Kruse via llvm-branch-commits

Meinersbur wrote:

> Having said that callbacks are all over the place in `OMPIRBuilder`.

There is term for it: [Callback 
hell](https://en.wiktionary.org/wiki/callback_hell)

https://github.com/llvm/llvm-project/pull/150925
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/21.x: [Flang] Search flang_rt in clang_rt path (#151954) (PR #152458)

2025-08-07 Thread Michael Kruse via llvm-branch-commits

Meinersbur wrote:

> Probably needs merged by a release manager?

Yes, the release manager's workflow is detailed here: 
https://llvm.org/docs/HowToReleaseLLVM.html#triaging-bug-reports-for-releases

https://github.com/llvm/llvm-project/pull/152458
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (PR #151989)

2025-08-07 Thread Sergio Afonso via llvm-branch-commits


@@ -0,0 +1,171 @@
+//===- AutomapToTargetData.cpp ---===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "flang/Optimizer/Builder/DirectivesCommon.h"
+#include "flang/Optimizer/Builder/FIRBuilder.h"
+#include "flang/Optimizer/Builder/HLFIRTools.h"
+#include "flang/Optimizer/Dialect/FIROps.h"
+#include "flang/Optimizer/Dialect/FIRType.h"
+#include "flang/Optimizer/Dialect/Support/KindMapping.h"
+#include "flang/Optimizer/HLFIR/HLFIROps.h"
+#include "mlir/IR/BuiltinAttributes.h"
+#include "mlir/Pass/Pass.h"
+#include "llvm/Frontend/OpenMP/OMPConstants.h"
+#include 
+#include 
+
+namespace flangomp {
+#define GEN_PASS_DEF_AUTOMAPTOTARGETDATAPASS
+#include "flang/Optimizer/OpenMP/Passes.h.inc"
+} // namespace flangomp
+
+using namespace mlir;
+
+namespace {
+class AutomapToTargetDataPass
+: public flangomp::impl::AutomapToTargetDataPassBase<
+  AutomapToTargetDataPass> {
+  // Returns true if the variable has a dynamic size and therefore requires
+  // bounds operations to describe its extents.
+  bool needsBoundsOps(Value var) {
+assert(isa(var.getType()) &&
+   "only pointer like types expected");
+Type t = fir::unwrapRefType(var.getType());
+if (Type inner = fir::dyn_cast_ptrOrBoxEleTy(t))
+  return fir::hasDynamicSize(inner);
+return fir::hasDynamicSize(t);
+  }
+
+  // Generate MapBoundsOp operations for the variable if required.
+  void genBoundsOps(fir::FirOpBuilder &builder, Value var,
+SmallVectorImpl &boundsOps) {
+Location loc = var.getLoc();
+fir::factory::AddrAndBoundsInfo info =
+fir::factory::getDataOperandBaseAddr(builder, var,
+ /*isOptional=*/false, loc);
+fir::ExtendedValue exv =
+hlfir::translateToExtendedValue(loc, builder, hlfir::Entity{info.addr},
+/*contiguousHint=*/true)
+.first;
+SmallVector tmp =
+fir::factory::genImplicitBoundsOps(
+builder, info, exv, /*dataExvIsAssumedSize=*/false, loc);
+llvm::append_range(boundsOps, tmp);
+  }
+
+  void findRelatedAllocmemFreemem(fir::AddrOfOp addressOfOp,
+  llvm::SmallVector &allocmems,
+  llvm::SmallVector &freemems) {
+assert(addressOfOp->hasOneUse() && "op must have single use");
+
+auto declaredRef =
+cast(*addressOfOp->getUsers().begin())->getResult(0);
+
+for (Operation *refUser : declaredRef.getUsers()) {
+  if (auto storeOp = dyn_cast(refUser))
+if (auto emboxOp = storeOp.getValue().getDefiningOp())
+  if (auto allocmemOp =
+  emboxOp.getOperand(0).getDefiningOp())
+allocmems.push_back(storeOp);
+
+  if (auto loadOp = dyn_cast(refUser))
+for (Operation *loadUser : loadOp.getResult().getUsers())
+  if (auto boxAddrOp = dyn_cast(loadUser))
+for (Operation *boxAddrUser : boxAddrOp.getResult().getUsers())
+  if (auto freememOp = dyn_cast(boxAddrUser))
+freemems.push_back(loadOp);
+}
+  }
+
+  void runOnOperation() override {
+ModuleOp module = getOperation()->getParentOfType();
+if (!module)
+  module = dyn_cast(getOperation());
+if (!module)
+  return;
+
+// Build FIR builder for helper utilities.
+fir::KindMapping kindMap = fir::getKindMapping(module);
+fir::FirOpBuilder builder{module, std::move(kindMap)};
+
+// Collect global variables with AUTOMAP flag.
+llvm::DenseSet automapGlobals;
+module.walk([&](fir::GlobalOp globalOp) {
+  if (auto iface =
+  dyn_cast(globalOp.getOperation()))
+if (iface.isDeclareTarget() && iface.getDeclareTargetAutomap())
+  automapGlobals.insert(globalOp);
+});
+
+for (fir::GlobalOp globalOp : automapGlobals)
+  if (auto uses = globalOp.getSymbolUses(module.getOperation()))
+for (auto &x : *uses)
+  if (auto addrOp = dyn_cast(x.getUser())) {
+llvm::SmallVector allocstores;
+llvm::SmallVector freememloads;
+findRelatedAllocmemFreemem(addrOp, allocstores, freememloads);
+
+for (auto storeOp : allocstores) {

skatrak wrote:

There's quite some code duplication between these two loops. I think it's worth 
refactoring into a lambda or template function.
```c++
auto processTargetDataClauses = [&](auto op, 
llvm::omp::OpenMPOffloadMappingFlags flags) -> 
omp::TargetEnterExitUpdateDataOperands {
  ...
};
for (auto storeOp : allocmemStores) {
  auto clauses = processLoadStore(storeOp);
  builder.create(storeOp.getLoc(), clauses);
}
for (au

[llvm-branch-commits] [flang] [llvm] [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (PR #151989)

2025-08-07 Thread Sergio Afonso via llvm-branch-commits


@@ -0,0 +1,171 @@
+//===- AutomapToTargetData.cpp ---===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "flang/Optimizer/Builder/DirectivesCommon.h"
+#include "flang/Optimizer/Builder/FIRBuilder.h"
+#include "flang/Optimizer/Builder/HLFIRTools.h"
+#include "flang/Optimizer/Dialect/FIROps.h"
+#include "flang/Optimizer/Dialect/FIRType.h"
+#include "flang/Optimizer/Dialect/Support/KindMapping.h"
+#include "flang/Optimizer/HLFIR/HLFIROps.h"
+#include "mlir/IR/BuiltinAttributes.h"
+#include "mlir/Pass/Pass.h"
+#include "llvm/Frontend/OpenMP/OMPConstants.h"
+#include 
+#include 
+
+namespace flangomp {
+#define GEN_PASS_DEF_AUTOMAPTOTARGETDATAPASS
+#include "flang/Optimizer/OpenMP/Passes.h.inc"
+} // namespace flangomp
+
+using namespace mlir;
+
+namespace {
+class AutomapToTargetDataPass
+: public flangomp::impl::AutomapToTargetDataPassBase<
+  AutomapToTargetDataPass> {
+  // Returns true if the variable has a dynamic size and therefore requires
+  // bounds operations to describe its extents.
+  bool needsBoundsOps(Value var) {
+assert(isa(var.getType()) &&
+   "only pointer like types expected");
+Type t = fir::unwrapRefType(var.getType());
+if (Type inner = fir::dyn_cast_ptrOrBoxEleTy(t))
+  return fir::hasDynamicSize(inner);
+return fir::hasDynamicSize(t);
+  }
+
+  // Generate MapBoundsOp operations for the variable if required.
+  void genBoundsOps(fir::FirOpBuilder &builder, Value var,
+SmallVectorImpl &boundsOps) {
+Location loc = var.getLoc();
+fir::factory::AddrAndBoundsInfo info =
+fir::factory::getDataOperandBaseAddr(builder, var,
+ /*isOptional=*/false, loc);
+fir::ExtendedValue exv =
+hlfir::translateToExtendedValue(loc, builder, hlfir::Entity{info.addr},
+/*contiguousHint=*/true)
+.first;
+SmallVector tmp =
+fir::factory::genImplicitBoundsOps(
+builder, info, exv, /*dataExvIsAssumedSize=*/false, loc);
+llvm::append_range(boundsOps, tmp);
+  }
+
+  void findRelatedAllocmemFreemem(fir::AddrOfOp addressOfOp,
+  llvm::SmallVector &allocmems,
+  llvm::SmallVector &freemems) {
+assert(addressOfOp->hasOneUse() && "op must have single use");
+
+auto declaredRef =
+cast(*addressOfOp->getUsers().begin())->getResult(0);
+
+for (Operation *refUser : declaredRef.getUsers()) {
+  if (auto storeOp = dyn_cast(refUser))
+if (auto emboxOp = storeOp.getValue().getDefiningOp())
+  if (auto allocmemOp =
+  emboxOp.getOperand(0).getDefiningOp())
+allocmems.push_back(storeOp);
+
+  if (auto loadOp = dyn_cast(refUser))
+for (Operation *loadUser : loadOp.getResult().getUsers())
+  if (auto boxAddrOp = dyn_cast(loadUser))
+for (Operation *boxAddrUser : boxAddrOp.getResult().getUsers())
+  if (auto freememOp = dyn_cast(boxAddrUser))
+freemems.push_back(loadOp);
+}
+  }
+
+  void runOnOperation() override {
+ModuleOp module = getOperation()->getParentOfType();
+if (!module)
+  module = dyn_cast(getOperation());
+if (!module)
+  return;
+
+// Build FIR builder for helper utilities.
+fir::KindMapping kindMap = fir::getKindMapping(module);
+fir::FirOpBuilder builder{module, std::move(kindMap)};
+
+// Collect global variables with AUTOMAP flag.
+llvm::DenseSet automapGlobals;
+module.walk([&](fir::GlobalOp globalOp) {
+  if (auto iface =
+  dyn_cast(globalOp.getOperation()))
+if (iface.isDeclareTarget() && iface.getDeclareTargetAutomap())
+  automapGlobals.insert(globalOp);
+});
+
+for (fir::GlobalOp globalOp : automapGlobals)
+  if (auto uses = globalOp.getSymbolUses(module.getOperation()))
+for (auto &x : *uses)
+  if (auto addrOp = dyn_cast(x.getUser())) {
+llvm::SmallVector allocstores;
+llvm::SmallVector freememloads;
+findRelatedAllocmemFreemem(addrOp, allocstores, freememloads);

skatrak wrote:

Would it be possible to first gather all stores and loads for all uses and then 
process them? That way we wouldn't have to allocate/deallocate these lists for 
each use. Something like:
```c++
for (fir::GlobalOp globalOp : automapGlobals) {
  if (auto uses = globalOp.getSymbolUses(module.getOperation())) {
llvm::SmallVector allocmemStores;
llvm::SmallVector freememLoads;
for (auto &x : *uses)
  if (auto addrOp = dyn_cast(x.getUser()))
fi

[llvm-branch-commits] [flang] [llvm] [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (PR #151989)

2025-08-07 Thread Sergio Afonso via llvm-branch-commits


@@ -0,0 +1,171 @@
+//===- AutomapToTargetData.cpp ---===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "flang/Optimizer/Builder/DirectivesCommon.h"
+#include "flang/Optimizer/Builder/FIRBuilder.h"
+#include "flang/Optimizer/Builder/HLFIRTools.h"
+#include "flang/Optimizer/Dialect/FIROps.h"
+#include "flang/Optimizer/Dialect/FIRType.h"
+#include "flang/Optimizer/Dialect/Support/KindMapping.h"
+#include "flang/Optimizer/HLFIR/HLFIROps.h"
+#include "mlir/IR/BuiltinAttributes.h"
+#include "mlir/Pass/Pass.h"
+#include "llvm/Frontend/OpenMP/OMPConstants.h"
+#include 
+#include 
+
+namespace flangomp {
+#define GEN_PASS_DEF_AUTOMAPTOTARGETDATAPASS
+#include "flang/Optimizer/OpenMP/Passes.h.inc"
+} // namespace flangomp
+
+using namespace mlir;
+
+namespace {
+class AutomapToTargetDataPass
+: public flangomp::impl::AutomapToTargetDataPassBase<
+  AutomapToTargetDataPass> {
+  // Returns true if the variable has a dynamic size and therefore requires
+  // bounds operations to describe its extents.
+  bool needsBoundsOps(Value var) {
+assert(isa(var.getType()) &&
+   "only pointer like types expected");
+Type t = fir::unwrapRefType(var.getType());
+if (Type inner = fir::dyn_cast_ptrOrBoxEleTy(t))
+  return fir::hasDynamicSize(inner);
+return fir::hasDynamicSize(t);
+  }
+
+  // Generate MapBoundsOp operations for the variable if required.
+  void genBoundsOps(fir::FirOpBuilder &builder, Value var,
+SmallVectorImpl &boundsOps) {
+Location loc = var.getLoc();
+fir::factory::AddrAndBoundsInfo info =
+fir::factory::getDataOperandBaseAddr(builder, var,
+ /*isOptional=*/false, loc);
+fir::ExtendedValue exv =
+hlfir::translateToExtendedValue(loc, builder, hlfir::Entity{info.addr},
+/*contiguousHint=*/true)
+.first;
+SmallVector tmp =
+fir::factory::genImplicitBoundsOps(
+builder, info, exv, /*dataExvIsAssumedSize=*/false, loc);
+llvm::append_range(boundsOps, tmp);
+  }
+
+  void findRelatedAllocmemFreemem(fir::AddrOfOp addressOfOp,
+  llvm::SmallVector &allocmems,
+  llvm::SmallVector &freemems) {
+assert(addressOfOp->hasOneUse() && "op must have single use");
+
+auto declaredRef =
+cast(*addressOfOp->getUsers().begin())->getResult(0);
+
+for (Operation *refUser : declaredRef.getUsers()) {
+  if (auto storeOp = dyn_cast(refUser))
+if (auto emboxOp = storeOp.getValue().getDefiningOp())
+  if (auto allocmemOp =
+  emboxOp.getOperand(0).getDefiningOp())
+allocmems.push_back(storeOp);
+
+  if (auto loadOp = dyn_cast(refUser))
+for (Operation *loadUser : loadOp.getResult().getUsers())
+  if (auto boxAddrOp = dyn_cast(loadUser))
+for (Operation *boxAddrUser : boxAddrOp.getResult().getUsers())
+  if (auto freememOp = dyn_cast(boxAddrUser))
+freemems.push_back(loadOp);
+}
+  }
+
+  void runOnOperation() override {
+ModuleOp module = getOperation()->getParentOfType();
+if (!module)
+  module = dyn_cast(getOperation());
+if (!module)
+  return;
+
+// Build FIR builder for helper utilities.
+fir::KindMapping kindMap = fir::getKindMapping(module);
+fir::FirOpBuilder builder{module, std::move(kindMap)};
+
+// Collect global variables with AUTOMAP flag.
+llvm::DenseSet automapGlobals;
+module.walk([&](fir::GlobalOp globalOp) {
+  if (auto iface =
+  dyn_cast(globalOp.getOperation()))
+if (iface.isDeclareTarget() && iface.getDeclareTargetAutomap())

skatrak wrote:

I think this is missing a check for the declare target type: 
`iface.getDeclareTargetDeviceType()`. Otherwise, this results in mapping 
`declare target device_type(host)` globals.

https://github.com/llvm/llvm-project/pull/151989
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (PR #151989)

2025-08-07 Thread Sergio Afonso via llvm-branch-commits


@@ -0,0 +1,171 @@
+//===- AutomapToTargetData.cpp ---===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "flang/Optimizer/Builder/DirectivesCommon.h"
+#include "flang/Optimizer/Builder/FIRBuilder.h"
+#include "flang/Optimizer/Builder/HLFIRTools.h"
+#include "flang/Optimizer/Dialect/FIROps.h"
+#include "flang/Optimizer/Dialect/FIRType.h"
+#include "flang/Optimizer/Dialect/Support/KindMapping.h"
+#include "flang/Optimizer/HLFIR/HLFIROps.h"
+#include "mlir/IR/BuiltinAttributes.h"
+#include "mlir/Pass/Pass.h"
+#include "llvm/Frontend/OpenMP/OMPConstants.h"
+#include 
+#include 
+
+namespace flangomp {
+#define GEN_PASS_DEF_AUTOMAPTOTARGETDATAPASS
+#include "flang/Optimizer/OpenMP/Passes.h.inc"
+} // namespace flangomp
+
+using namespace mlir;
+
+namespace {
+class AutomapToTargetDataPass
+: public flangomp::impl::AutomapToTargetDataPassBase<
+  AutomapToTargetDataPass> {
+  // Returns true if the variable has a dynamic size and therefore requires
+  // bounds operations to describe its extents.
+  bool needsBoundsOps(Value var) {
+assert(isa(var.getType()) &&
+   "only pointer like types expected");
+Type t = fir::unwrapRefType(var.getType());
+if (Type inner = fir::dyn_cast_ptrOrBoxEleTy(t))
+  return fir::hasDynamicSize(inner);
+return fir::hasDynamicSize(t);
+  }
+
+  // Generate MapBoundsOp operations for the variable if required.
+  void genBoundsOps(fir::FirOpBuilder &builder, Value var,
+SmallVectorImpl &boundsOps) {
+Location loc = var.getLoc();
+fir::factory::AddrAndBoundsInfo info =
+fir::factory::getDataOperandBaseAddr(builder, var,
+ /*isOptional=*/false, loc);
+fir::ExtendedValue exv =
+hlfir::translateToExtendedValue(loc, builder, hlfir::Entity{info.addr},
+/*contiguousHint=*/true)
+.first;
+SmallVector tmp =
+fir::factory::genImplicitBoundsOps(
+builder, info, exv, /*dataExvIsAssumedSize=*/false, loc);
+llvm::append_range(boundsOps, tmp);
+  }
+
+  void findRelatedAllocmemFreemem(fir::AddrOfOp addressOfOp,
+  llvm::SmallVector &allocmems,
+  llvm::SmallVector &freemems) {
+assert(addressOfOp->hasOneUse() && "op must have single use");
+
+auto declaredRef =
+cast(*addressOfOp->getUsers().begin())->getResult(0);
+
+for (Operation *refUser : declaredRef.getUsers()) {
+  if (auto storeOp = dyn_cast(refUser))
+if (auto emboxOp = storeOp.getValue().getDefiningOp())
+  if (auto allocmemOp =
+  emboxOp.getOperand(0).getDefiningOp())
+allocmems.push_back(storeOp);
+
+  if (auto loadOp = dyn_cast(refUser))
+for (Operation *loadUser : loadOp.getResult().getUsers())
+  if (auto boxAddrOp = dyn_cast(loadUser))
+for (Operation *boxAddrUser : boxAddrOp.getResult().getUsers())
+  if (auto freememOp = dyn_cast(boxAddrUser))
+freemems.push_back(loadOp);
+}
+  }
+
+  void runOnOperation() override {
+ModuleOp module = getOperation()->getParentOfType();
+if (!module)
+  module = dyn_cast(getOperation());
+if (!module)
+  return;
+
+// Build FIR builder for helper utilities.
+fir::KindMapping kindMap = fir::getKindMapping(module);
+fir::FirOpBuilder builder{module, std::move(kindMap)};
+
+// Collect global variables with AUTOMAP flag.
+llvm::DenseSet automapGlobals;
+module.walk([&](fir::GlobalOp globalOp) {
+  if (auto iface =
+  dyn_cast(globalOp.getOperation()))
+if (iface.isDeclareTarget() && iface.getDeclareTargetAutomap())
+  automapGlobals.insert(globalOp);
+});
+
+for (fir::GlobalOp globalOp : automapGlobals)
+  if (auto uses = globalOp.getSymbolUses(module.getOperation()))
+for (auto &x : *uses)
+  if (auto addrOp = dyn_cast(x.getUser())) {
+llvm::SmallVector allocstores;
+llvm::SmallVector freememloads;

skatrak wrote:

```suggestion
llvm::SmallVector allocmemStores;
llvm::SmallVector freememLoads;
```

https://github.com/llvm/llvm-project/pull/151989
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (PR #151989)

2025-08-07 Thread Sergio Afonso via llvm-branch-commits


@@ -0,0 +1,171 @@
+//===- AutomapToTargetData.cpp ---===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "flang/Optimizer/Builder/DirectivesCommon.h"
+#include "flang/Optimizer/Builder/FIRBuilder.h"
+#include "flang/Optimizer/Builder/HLFIRTools.h"
+#include "flang/Optimizer/Dialect/FIROps.h"
+#include "flang/Optimizer/Dialect/FIRType.h"
+#include "flang/Optimizer/Dialect/Support/KindMapping.h"
+#include "flang/Optimizer/HLFIR/HLFIROps.h"
+#include "mlir/IR/BuiltinAttributes.h"
+#include "mlir/Pass/Pass.h"
+#include "llvm/Frontend/OpenMP/OMPConstants.h"
+#include 
+#include 
+
+namespace flangomp {
+#define GEN_PASS_DEF_AUTOMAPTOTARGETDATAPASS
+#include "flang/Optimizer/OpenMP/Passes.h.inc"
+} // namespace flangomp
+
+using namespace mlir;
+
+namespace {
+class AutomapToTargetDataPass
+: public flangomp::impl::AutomapToTargetDataPassBase<
+  AutomapToTargetDataPass> {
+  // Returns true if the variable has a dynamic size and therefore requires
+  // bounds operations to describe its extents.
+  bool needsBoundsOps(Value var) {

skatrak wrote:

I agree. Maybe flang/include/flang/Support/OpenMP-utils.h and 
flang/lib/Support/OpenMP-utils.cpp could be where this logic can be shared 
between lowering and this pass.

https://github.com/llvm/llvm-project/pull/151989
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (PR #151989)

2025-08-07 Thread Sergio Afonso via llvm-branch-commits


@@ -0,0 +1,40 @@
+// RUN: fir-opt --omp-automap-to-target-data %s | FileCheck %s
+// Test OMP AutomapToTargetData pass.
+
+module {
+  fir.global
+  @_QMtestEarr{omp.declare_target = #omp.declaretarget} target
+   : !fir.box>>
+
+  func.func @automap() {
+%c0 = arith.constant 0 : index
+%c10 = arith.constant 10 : i32
+%addr = fir.address_of(@_QMtestEarr) : 
!fir.ref>>>
+%decl:2 = hlfir.declare %addr {fortran_attrs = #fir.var_attrs, uniq_name = "_QMtestEarr"} : 
(!fir.ref>>>) -> 
(!fir.ref>>>, 
!fir.ref>>>)
+%idx = fir.convert %c10 : (i32) -> index
+%cond = arith.cmpi sgt, %idx, %c0 : index
+%n = arith.select %cond, %idx, %c0 : index
+%mem = fir.allocmem !fir.array, %n {fir.must_be_heap = true}
+%shape = fir.shape %n : (index) -> !fir.shape<1>
+%box = fir.embox %mem(%shape) : (!fir.heap>, 
!fir.shape<1>) -> !fir.box>>
+fir.store %box to %decl#0 : 
!fir.ref>>>
+%ld = fir.load %decl#0 : !fir.ref>>>
+%base = fir.box_addr %ld : (!fir.box>>) -> 
!fir.heap>
+fir.freemem %base : !fir.heap>
+%undef = fir.zero_bits !fir.heap>
+%sh0 = fir.shape %c0 : (index) -> !fir.shape<1>
+%empty = fir.embox %undef(%sh0) : (!fir.heap>, 
!fir.shape<1>) -> !fir.box>>
+fir.store %empty to %decl#0 : 
!fir.ref>>>
+return
+  }
+}
+
+// CHECK-LABEL: func.func @automap()
+// CHECK: fir.allocmem
+// CHECK: fir.store
+// CHECK: omp.map.info {{.*}}map_clauses(to)
+// CHECK: omp.target_enter_data
+// CHECK: omp.map.info {{.*}}map_clauses(delete)
+// CHECK: omp.target_exit_data
+// CHECK: fir.freemem

skatrak wrote:

Nit: I think we should also test how values defined by these ops are passed to 
the other ops, not just checking that the expected ops are there. Also it would 
be good to check that uses of the global variable are placed between the 
`omp.target_enter_data` and `omp.target_exit_data`.

https://github.com/llvm/llvm-project/pull/151989
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (PR #151989)

2025-08-07 Thread Sergio Afonso via llvm-branch-commits


@@ -316,13 +316,13 @@ void createOpenMPFIRPassPipeline(mlir::PassManager &pm,
 pm.addPass(flangomp::createDoConcurrentConversionPass(
 opts.doConcurrentMappingKind == DoConcurrentMappingKind::DCMK_Device));
 
-  // The MapsForPrivatizedSymbols pass needs to run before
-  // MapInfoFinalizationPass because the former creates new
-  // MapInfoOp instances, typically for descriptors.
-  // MapInfoFinalizationPass adds MapInfoOp instances for the descriptors
-  // underlying data which is necessary to access the data on the offload
-  // target device.
+  // The MapsForPrivatizedSymbols and AutomapToTargetDataPass pass needs to run
+  // before MapInfoFinalizationPass because the former creates new MapInfoOp

skatrak wrote:

```suggestion
  // before MapInfoFinalizationPass because they create new MapInfoOp
```

https://github.com/llvm/llvm-project/pull/151989
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (PR #151989)

2025-08-07 Thread Sergio Afonso via llvm-branch-commits


@@ -0,0 +1,171 @@
+//===- AutomapToTargetData.cpp ---===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "flang/Optimizer/Builder/DirectivesCommon.h"
+#include "flang/Optimizer/Builder/FIRBuilder.h"
+#include "flang/Optimizer/Builder/HLFIRTools.h"
+#include "flang/Optimizer/Dialect/FIROps.h"
+#include "flang/Optimizer/Dialect/FIRType.h"
+#include "flang/Optimizer/Dialect/Support/KindMapping.h"
+#include "flang/Optimizer/HLFIR/HLFIROps.h"
+#include "mlir/IR/BuiltinAttributes.h"
+#include "mlir/Pass/Pass.h"
+#include "llvm/Frontend/OpenMP/OMPConstants.h"
+#include 
+#include 
+
+namespace flangomp {
+#define GEN_PASS_DEF_AUTOMAPTOTARGETDATAPASS
+#include "flang/Optimizer/OpenMP/Passes.h.inc"
+} // namespace flangomp
+
+using namespace mlir;
+
+namespace {
+class AutomapToTargetDataPass
+: public flangomp::impl::AutomapToTargetDataPassBase<
+  AutomapToTargetDataPass> {
+  // Returns true if the variable has a dynamic size and therefore requires
+  // bounds operations to describe its extents.
+  bool needsBoundsOps(Value var) {
+assert(isa(var.getType()) &&
+   "only pointer like types expected");
+Type t = fir::unwrapRefType(var.getType());
+if (Type inner = fir::dyn_cast_ptrOrBoxEleTy(t))
+  return fir::hasDynamicSize(inner);
+return fir::hasDynamicSize(t);
+  }
+
+  // Generate MapBoundsOp operations for the variable if required.
+  void genBoundsOps(fir::FirOpBuilder &builder, Value var,
+SmallVectorImpl &boundsOps) {
+Location loc = var.getLoc();
+fir::factory::AddrAndBoundsInfo info =
+fir::factory::getDataOperandBaseAddr(builder, var,
+ /*isOptional=*/false, loc);
+fir::ExtendedValue exv =
+hlfir::translateToExtendedValue(loc, builder, hlfir::Entity{info.addr},
+/*contiguousHint=*/true)
+.first;
+SmallVector tmp =
+fir::factory::genImplicitBoundsOps(
+builder, info, exv, /*dataExvIsAssumedSize=*/false, loc);
+llvm::append_range(boundsOps, tmp);
+  }
+
+  void findRelatedAllocmemFreemem(fir::AddrOfOp addressOfOp,
+  llvm::SmallVector &allocmems,
+  llvm::SmallVector &freemems) {
+assert(addressOfOp->hasOneUse() && "op must have single use");
+
+auto declaredRef =
+cast(*addressOfOp->getUsers().begin())->getResult(0);
+
+for (Operation *refUser : declaredRef.getUsers()) {
+  if (auto storeOp = dyn_cast(refUser))
+if (auto emboxOp = storeOp.getValue().getDefiningOp())
+  if (auto allocmemOp =
+  emboxOp.getOperand(0).getDefiningOp())
+allocmems.push_back(storeOp);
+
+  if (auto loadOp = dyn_cast(refUser))
+for (Operation *loadUser : loadOp.getResult().getUsers())
+  if (auto boxAddrOp = dyn_cast(loadUser))
+for (Operation *boxAddrUser : boxAddrOp.getResult().getUsers())
+  if (auto freememOp = dyn_cast(boxAddrUser))
+freemems.push_back(loadOp);
+}
+  }
+
+  void runOnOperation() override {
+ModuleOp module = getOperation()->getParentOfType();
+if (!module)
+  module = dyn_cast(getOperation());
+if (!module)
+  return;
+
+// Build FIR builder for helper utilities.
+fir::KindMapping kindMap = fir::getKindMapping(module);
+fir::FirOpBuilder builder{module, std::move(kindMap)};
+
+// Collect global variables with AUTOMAP flag.
+llvm::DenseSet automapGlobals;
+module.walk([&](fir::GlobalOp globalOp) {
+  if (auto iface =
+  dyn_cast(globalOp.getOperation()))
+if (iface.isDeclareTarget() && iface.getDeclareTargetAutomap())
+  automapGlobals.insert(globalOp);
+});
+
+for (fir::GlobalOp globalOp : automapGlobals)
+  if (auto uses = globalOp.getSymbolUses(module.getOperation()))
+for (auto &x : *uses)

skatrak wrote:

Nit: Use braces here: 
[link](https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements).

> Similarly, braces should be used when a single-statement body is complex 
> enough that it becomes difficult to see where the block containing the 
> following statement began.

https://github.com/llvm/llvm-project/pull/151989
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (PR #151989)

2025-08-07 Thread Sergio Afonso via llvm-branch-commits


@@ -316,13 +316,13 @@ void createOpenMPFIRPassPipeline(mlir::PassManager &pm,
 pm.addPass(flangomp::createDoConcurrentConversionPass(
 opts.doConcurrentMappingKind == DoConcurrentMappingKind::DCMK_Device));
 
-  // The MapsForPrivatizedSymbols pass needs to run before
-  // MapInfoFinalizationPass because the former creates new
-  // MapInfoOp instances, typically for descriptors.
-  // MapInfoFinalizationPass adds MapInfoOp instances for the descriptors
-  // underlying data which is necessary to access the data on the offload
-  // target device.
+  // The MapsForPrivatizedSymbols and AutomapToTargetDataPass pass needs to run

skatrak wrote:

```suggestion
  // The MapsForPrivatizedSymbols and AutomapToTargetDataPass pass need to run
```

https://github.com/llvm/llvm-project/pull/151989
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (PR #151989)

2025-08-07 Thread Sergio Afonso via llvm-branch-commits

https://github.com/skatrak commented:

Thank you Akash, a couple of minor comments from me.

https://github.com/llvm/llvm-project/pull/151989
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (PR #151989)

2025-08-07 Thread Sergio Afonso via llvm-branch-commits

https://github.com/skatrak edited 
https://github.com/llvm/llvm-project/pull/151989
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/21.x: [flang-rt] Use correct flang-rt build for flang-rt unit tests on Windows (#152318) (PR #152493)

2025-08-07 Thread Michael Kruse via llvm-branch-commits

https://github.com/Meinersbur approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/152493
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][ISel] Extend vector_splice tests (NFC) (PR #152553)

2025-08-07 Thread Gaëtan Bossu via llvm-branch-commits

https://github.com/gbossu edited 
https://github.com/llvm/llvm-project/pull/152553
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][ISel] Extend vector_splice tests (NFC) (PR #152553)

2025-08-07 Thread Gaëtan Bossu via llvm-branch-commits


@@ -0,0 +1,162 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mattr=+sve  -verify-machineinstrs < %s | FileCheck %s
+; RUN: llc -mattr=+sve2 -verify-machineinstrs < %s | FileCheck %s
+
+target triple = "aarch64-unknown-linux-gnu"
+
+; Test vector_splice patterns.
+; Note that this test is similar to named-vector-shuffles-sve.ll, but it 
focuses
+; on testing all supported types, and a positive "splice index".
+
+
+; i8 elements
+define  @splice_nxv16i8( %a,  %b) {
+; CHECK-LABEL: splice_nxv16i8:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ext z0.b, z0.b, z1.b, #1
+; CHECK-NEXT:ret
+  %res = call  @llvm.vector.splice.nxv16i8( %a,  %b, i32 1)
+  ret  %res
+}
+
+; i16 elements
+define  @splice_nxv8i16( %a,  %b) {
+; CHECK-LABEL: splice_nxv8i16:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ext z0.b, z0.b, z1.b, #2
+; CHECK-NEXT:ret
+  %res = call  @llvm.vector.splice.nxv8i16( %a,  %b, i32 1)
+  ret  %res
+}
+
+; bf16 elements
+
+define  @splice_nxv8bfloat( %a, 
 %b) {
+; CHECK-LABEL: splice_nxv8bfloat:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ext z0.b, z0.b, z1.b, #2
+; CHECK-NEXT:ret
+  %res = call  @llvm.vector.splice.nxv8bfloat( %a,  %b, i32 1)
+  ret  %res
+}
+
+define  @splice_nxv4bfloat( %a, 
 %b) {
+; CHECK-LABEL: splice_nxv4bfloat:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ext z0.b, z0.b, z1.b, #4
+; CHECK-NEXT:ret
+  %res = call  @llvm.vector.splice.nxv4bfloat( %a,  %b, i32 1)
+  ret  %res
+}

gbossu wrote:

⚠️  Similar to what I had metionned in a closed PR: 
https://github.com/llvm/llvm-project/pull/151730#discussion_r2248448988

We have patterns for `EXT_ZZI` with these "weird" types where the fixed part 
isn't 128-bit:
 - 
 - 
 - 
 - 
 - 

I'm not sure why they were here in the first place, and looking at the 
generated code, I think the patterns are wrong.

https://github.com/llvm/llvm-project/pull/152553
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][ISel] Select constructive EXT_ZZZI pseudo instruction (PR #152554)

2025-08-07 Thread Gaëtan Bossu via llvm-branch-commits


@@ -150,13 +150,14 @@ define void @fcvtzu_v16f16_v16i32(ptr %a, ptr %b) #0 {
 ; VBITS_GE_256-NEXT:mov x8, #8 // =0x8
 ; VBITS_GE_256-NEXT:ld1h { z0.h }, p0/z, [x0]
 ; VBITS_GE_256-NEXT:ptrue p0.s, vl8
-; VBITS_GE_256-NEXT:uunpklo z1.s, z0.h
-; VBITS_GE_256-NEXT:ext z0.b, z0.b, z0.b, #16
+; VBITS_GE_256-NEXT:movprfx z1, z0
+; VBITS_GE_256-NEXT:ext z1.b, z1.b, z0.b, #16
 ; VBITS_GE_256-NEXT:uunpklo z0.s, z0.h
-; VBITS_GE_256-NEXT:fcvtzu z1.s, p0/m, z1.h
+; VBITS_GE_256-NEXT:uunpklo z1.s, z1.h
 ; VBITS_GE_256-NEXT:fcvtzu z0.s, p0/m, z0.h
-; VBITS_GE_256-NEXT:st1w { z1.s }, p0, [x1]
-; VBITS_GE_256-NEXT:st1w { z0.s }, p0, [x1, x8, lsl #2]
+; VBITS_GE_256-NEXT:fcvtzu z1.s, p0/m, z1.h
+; VBITS_GE_256-NEXT:st1w { z0.s }, p0, [x1]
+; VBITS_GE_256-NEXT:st1w { z1.s }, p0, [x1, x8, lsl #2]

gbossu wrote:

In that example, we do get one more instruction now (the `movprfx`), but I 
think the schedule is actually better because we eliminate one dependency 
between `ext` and the second `uunpklo`. Now the two `uunpklo` can execute in 
parallel.

This is is the theme of the test updates in general: Sometimes more 
instructions, but more freedom for the `MachineScheduler`

https://github.com/llvm/llvm-project/pull/152554
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (PR #151989)

2025-08-07 Thread Akash Banerjee via llvm-branch-commits

https://github.com/TIFitis updated 
https://github.com/llvm/llvm-project/pull/151989

>From e9b6766c5fbfd25b5acfc686cbdc41f8dd727b03 Mon Sep 17 00:00:00 2001
From: Akash Banerjee 
Date: Thu, 31 Jul 2025 19:48:15 +0100
Subject: [PATCH 1/2] [MLIR][OpenMP] Add a new AutomapToTargetData conversion
 pass in FIR

Add a new AutomapToTargetData pass. This gathers the declare target enter 
variables which have the AUTOMAP modifier.
And adds omp.declare_target_enter/exit mapping directives for fir.alloca and 
fir.free oeprations on the AUTOMAP enabled variables.
---
 .../include/flang/Optimizer/OpenMP/Passes.td  |  11 ++
 .../Optimizer/OpenMP/AutomapToTargetData.cpp  | 171 ++
 flang/lib/Optimizer/OpenMP/CMakeLists.txt |   1 +
 flang/lib/Optimizer/Passes/Pipelines.cpp  |  12 +-
 .../Transforms/omp-automap-to-target-data.fir |  40 
 .../fortran/declare-target-automap.f90|  36 
 6 files changed, 265 insertions(+), 6 deletions(-)
 create mode 100644 flang/lib/Optimizer/OpenMP/AutomapToTargetData.cpp
 create mode 100644 flang/test/Transforms/omp-automap-to-target-data.fir
 create mode 100644 offload/test/offloading/fortran/declare-target-automap.f90

diff --git a/flang/include/flang/Optimizer/OpenMP/Passes.td 
b/flang/include/flang/Optimizer/OpenMP/Passes.td
index 704faf0ccd856..0bff58f0f6394 100644
--- a/flang/include/flang/Optimizer/OpenMP/Passes.td
+++ b/flang/include/flang/Optimizer/OpenMP/Passes.td
@@ -112,4 +112,15 @@ def GenericLoopConversionPass
   ];
 }
 
+def AutomapToTargetDataPass
+: Pass<"omp-automap-to-target-data", "::mlir::ModuleOp"> {
+  let summary = "Insert OpenMP target data operations for AUTOMAP variables";
+  let description = [{
+Inserts `omp.target_enter_data` and `omp.target_exit_data` operations to
+map variables marked with the `AUTOMAP` modifier when their allocation
+or deallocation is detected in the FIR.
+  }];
+  let dependentDialects = ["mlir::omp::OpenMPDialect"];
+}
+
 #endif //FORTRAN_OPTIMIZER_OPENMP_PASSES
diff --git a/flang/lib/Optimizer/OpenMP/AutomapToTargetData.cpp 
b/flang/lib/Optimizer/OpenMP/AutomapToTargetData.cpp
new file mode 100644
index 0..c4937f1e90ee3
--- /dev/null
+++ b/flang/lib/Optimizer/OpenMP/AutomapToTargetData.cpp
@@ -0,0 +1,171 @@
+//===- AutomapToTargetData.cpp ---===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "flang/Optimizer/Builder/DirectivesCommon.h"
+#include "flang/Optimizer/Builder/FIRBuilder.h"
+#include "flang/Optimizer/Builder/HLFIRTools.h"
+#include "flang/Optimizer/Dialect/FIROps.h"
+#include "flang/Optimizer/Dialect/FIRType.h"
+#include "flang/Optimizer/Dialect/Support/KindMapping.h"
+#include "flang/Optimizer/HLFIR/HLFIROps.h"
+#include "mlir/IR/BuiltinAttributes.h"
+#include "mlir/Pass/Pass.h"
+#include "llvm/Frontend/OpenMP/OMPConstants.h"
+#include 
+#include 
+
+namespace flangomp {
+#define GEN_PASS_DEF_AUTOMAPTOTARGETDATAPASS
+#include "flang/Optimizer/OpenMP/Passes.h.inc"
+} // namespace flangomp
+
+using namespace mlir;
+
+namespace {
+class AutomapToTargetDataPass
+: public flangomp::impl::AutomapToTargetDataPassBase<
+  AutomapToTargetDataPass> {
+  // Returns true if the variable has a dynamic size and therefore requires
+  // bounds operations to describe its extents.
+  bool needsBoundsOps(Value var) {
+assert(isa(var.getType()) &&
+   "only pointer like types expected");
+Type t = fir::unwrapRefType(var.getType());
+if (Type inner = fir::dyn_cast_ptrOrBoxEleTy(t))
+  return fir::hasDynamicSize(inner);
+return fir::hasDynamicSize(t);
+  }
+
+  // Generate MapBoundsOp operations for the variable if required.
+  void genBoundsOps(fir::FirOpBuilder &builder, Value var,
+SmallVectorImpl &boundsOps) {
+Location loc = var.getLoc();
+fir::factory::AddrAndBoundsInfo info =
+fir::factory::getDataOperandBaseAddr(builder, var,
+ /*isOptional=*/false, loc);
+fir::ExtendedValue exv =
+hlfir::translateToExtendedValue(loc, builder, hlfir::Entity{info.addr},
+/*contiguousHint=*/true)
+.first;
+SmallVector tmp =
+fir::factory::genImplicitBoundsOps(
+builder, info, exv, /*dataExvIsAssumedSize=*/false, loc);
+llvm::append_range(boundsOps, tmp);
+  }
+
+  void findRelatedAllocmemFreemem(fir::AddrOfOp addressOfOp,
+  llvm::SmallVector &allocmems,
+  llvm::SmallVector &freemems) {
+assert(addressOfOp->hasOneUse() && "op must have single use");
+
+auto declaredRef =
+cast(*addressOfOp->getUsers().begin

[llvm-branch-commits] [llvm] [AArch64][ISel] Select constructive EXT_ZZZI pseudo instruction (PR #152554)

2025-08-07 Thread Gaëtan Bossu via llvm-branch-commits


@@ -256,12 +256,13 @@ define  
@splice_nxv2f64_last_idx( %a,
 define  @splice_nxv2i1_idx( %a,  %b) #0 {
 ; CHECK-LABEL: splice_nxv2i1_idx:
 ; CHECK:   // %bb.0:
-; CHECK-NEXT:mov z0.d, p1/z, #1 // =0x1
 ; CHECK-NEXT:mov z1.d, p0/z, #1 // =0x1
+; CHECK-NEXT:mov z0.d, p1/z, #1 // =0x1
 ; CHECK-NEXT:ptrue p0.d
-; CHECK-NEXT:ext z1.b, z1.b, z0.b, #8
-; CHECK-NEXT:and z1.d, z1.d, #0x1
-; CHECK-NEXT:cmpne p0.d, p0/z, z1.d, #0
+; CHECK-NEXT:mov z0.d, z1.d

gbossu wrote:

This is one case where we get worse due to an extra MOV that could not be 
turned into a MOVPRFX. THis is alleviated in the next commit using register 
hints.

https://github.com/llvm/llvm-project/pull/152554
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (PR #151989)

2025-08-07 Thread Akash Banerjee via llvm-branch-commits


@@ -0,0 +1,40 @@
+// RUN: fir-opt --omp-automap-to-target-data %s | FileCheck %s
+// Test OMP AutomapToTargetData pass.
+
+module {
+  fir.global
+  @_QMtestEarr{omp.declare_target = #omp.declaretarget} target
+   : !fir.box>>
+
+  func.func @automap() {
+%c0 = arith.constant 0 : index
+%c10 = arith.constant 10 : i32
+%addr = fir.address_of(@_QMtestEarr) : 
!fir.ref>>>
+%decl:2 = hlfir.declare %addr {fortran_attrs = #fir.var_attrs, uniq_name = "_QMtestEarr"} : 
(!fir.ref>>>) -> 
(!fir.ref>>>, 
!fir.ref>>>)
+%idx = fir.convert %c10 : (i32) -> index
+%cond = arith.cmpi sgt, %idx, %c0 : index
+%n = arith.select %cond, %idx, %c0 : index
+%mem = fir.allocmem !fir.array, %n {fir.must_be_heap = true}
+%shape = fir.shape %n : (index) -> !fir.shape<1>
+%box = fir.embox %mem(%shape) : (!fir.heap>, 
!fir.shape<1>) -> !fir.box>>
+fir.store %box to %decl#0 : 
!fir.ref>>>
+%ld = fir.load %decl#0 : !fir.ref>>>
+%base = fir.box_addr %ld : (!fir.box>>) -> 
!fir.heap>
+fir.freemem %base : !fir.heap>
+%undef = fir.zero_bits !fir.heap>
+%sh0 = fir.shape %c0 : (index) -> !fir.shape<1>
+%empty = fir.embox %undef(%sh0) : (!fir.heap>, 
!fir.shape<1>) -> !fir.box>>
+fir.store %empty to %decl#0 : 
!fir.ref>>>
+return
+  }
+}
+
+// CHECK-LABEL: func.func @automap()
+// CHECK: fir.allocmem
+// CHECK: fir.store
+// CHECK: omp.map.info {{.*}}map_clauses(to)
+// CHECK: omp.target_enter_data
+// CHECK: omp.map.info {{.*}}map_clauses(delete)
+// CHECK: omp.target_exit_data
+// CHECK: fir.freemem

TIFitis wrote:

I've updated the test to make sure it's mapping the automap global.

The test checks that the `target_enter_data` succeeds the `allocmem` operation 
and the `target_exit_data` precedes the `freemem` operation which should imply 
any other use of the global in between would remain intact.

Let me know if you're happy with the updated test.

https://github.com/llvm/llvm-project/pull/151989
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][ISel] Select constructive EXT_ZZZI pseudo instruction (PR #152554)

2025-08-07 Thread Gaëtan Bossu via llvm-branch-commits


@@ -86,6 +83,13 @@ bool 
AArch64PostCoalescer::runOnMachineFunction(MachineFunction &MF) {
 Changed = true;
 break;
   }
+  case AArch64::EXT_ZZZI:
+Register DstReg = MI.getOperand(0).getReg();
+Register SrcReg1 = MI.getOperand(1).getReg();
+if (SrcReg1 != DstReg) {
+  MRI->setRegAllocationHint(DstReg, 0, SrcReg1);
+}
+break;

gbossu wrote:

Note that this commit is really just a WIP to show we can slightly improve 
codegen with some hints. I'm not sure it should remain in that PR.

https://github.com/llvm/llvm-project/pull/152554
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL] Add implicit binding attribute to resource arrays (PR #152452)

2025-08-07 Thread Joshua Batista via llvm-branch-commits


@@ -3640,6 +3655,24 @@ void SemaHLSL::ActOnVariableDeclarator(VarDecl *VD) {
 
 // process explicit bindings
 processExplicitBindingsOnDecl(VD);
+
+if (VD->getType()->isHLSLResourceRecordArray()) {
+  // If the resource array does not have an explicit binding attribute,
+  // create an implicit one. It will be used to transfer implicit binding
+  // order_ID to codegen.
+  if (!VD->hasAttr()) {

bob80905 wrote:

Shouldn't this check if it's missing HLSLResourceBindingAttr? Or is this saying 
that HLSLVkBindingAttr is only added when a binding attribute is explicitly 
spelled out?

https://github.com/llvm/llvm-project/pull/152452
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL] Add implicit binding attribute to resource arrays (PR #152452)

2025-08-07 Thread Helena Kotas via llvm-branch-commits

https://github.com/hekota updated 
https://github.com/llvm/llvm-project/pull/152452

>From 4e153a4da8b990a1d07d6d1d63d2be74ed45e2eb Mon Sep 17 00:00:00 2001
From: Helena Kotas 
Date: Thu, 7 Aug 2025 00:37:23 -0700
Subject: [PATCH 1/2] [HLSL] Add implicit binding attribute to resource arrays
 without binding and make them static

If a resource array does not have an explicit binding attribute, SemaHLSL will 
add
an implicit one. The attribute will be used to transfer implicit binding order 
ID
to the codegen, the same way as it is done for HLSLBufferDecls. This is 
necessary
in order to generate correct initialization of resources in an array that does 
not
have an explicit binding.

This change also marks resource arrays declared at a global scope as `static`, 
which
is what is already done for standalone resources.
---
 clang/lib/Sema/SemaHLSL.cpp   | 57 +++
 .../test/AST/HLSL/resource_binding_attr.hlsl  | 28 +++--
 2 files changed, 69 insertions(+), 16 deletions(-)

diff --git a/clang/lib/Sema/SemaHLSL.cpp b/clang/lib/Sema/SemaHLSL.cpp
index 873efdae38f18..ffb996e79409c 100644
--- a/clang/lib/Sema/SemaHLSL.cpp
+++ b/clang/lib/Sema/SemaHLSL.cpp
@@ -71,6 +71,10 @@ static RegisterType getRegisterType(ResourceClass RC) {
   llvm_unreachable("unexpected ResourceClass value");
 }
 
+static RegisterType getRegisterType(const HLSLAttributedResourceType *ResTy) {
+  return getRegisterType(ResTy->getAttrs().ResourceClass);
+}
+
 // Converts the first letter of string Slot to RegisterType.
 // Returns false if the letter does not correspond to a valid register type.
 static bool convertToRegisterType(StringRef Slot, RegisterType *RT) {
@@ -342,6 +346,17 @@ static bool isResourceRecordTypeOrArrayOf(VarDecl *VD) {
   return Ty->isHLSLResourceRecord() || Ty->isHLSLResourceRecordArray();
 }
 
+static const HLSLAttributedResourceType *
+getResourceArrayHandleType(VarDecl *VD) {
+  assert(VD->getType()->isHLSLResourceRecordArray() &&
+ "expected array of resource records");
+  const Type *Ty = VD->getType()->getUnqualifiedDesugaredType();
+  while (const ConstantArrayType *CAT = dyn_cast(Ty)) {
+Ty = CAT->getArrayElementTypeNoTypeQual()->getUnqualifiedDesugaredType();
+  }
+  return HLSLAttributedResourceType::findHandleTypeOnResource(Ty);
+}
+
 // Returns true if the type is a leaf element type that is not valid to be
 // included in HLSL Buffer, such as a resource class, empty struct, zero-sized
 // array, or a builtin intangible type. Returns false it is a valid leaf 
element
@@ -568,16 +583,13 @@ void createHostLayoutStructForBuffer(Sema &S, 
HLSLBufferDecl *BufDecl) {
   BufDecl->addLayoutStruct(LS);
 }
 
-static void addImplicitBindingAttrToBuffer(Sema &S, HLSLBufferDecl *BufDecl,
-   uint32_t ImplicitBindingOrderID) {
-  RegisterType RT =
-  BufDecl->isCBuffer() ? RegisterType::CBuffer : RegisterType::SRV;
+static void addImplicitBindingAttrToDecl(Sema &S, Decl *D, RegisterType RT,
+ uint32_t ImplicitBindingOrderID) {
   auto *Attr =
   HLSLResourceBindingAttr::CreateImplicit(S.getASTContext(), "", "0", {});
-  std::optional RegSlot;
-  Attr->setBinding(RT, RegSlot, 0);
+  Attr->setBinding(RT, std::nullopt, 0);
   Attr->setImplicitBindingOrderID(ImplicitBindingOrderID);
-  BufDecl->addAttr(Attr);
+  D->addAttr(Attr);
 }
 
 // Handle end of cbuffer/tbuffer declaration
@@ -600,7 +612,10 @@ void SemaHLSL::ActOnFinishBuffer(Decl *Dcl, SourceLocation 
RBrace) {
 if (RBA)
   RBA->setImplicitBindingOrderID(OrderID);
 else
-  addImplicitBindingAttrToBuffer(SemaRef, BufDecl, OrderID);
+  addImplicitBindingAttrToDecl(SemaRef, BufDecl,
+   BufDecl->isCBuffer() ? RegisterType::CBuffer
+: RegisterType::SRV,
+   OrderID);
   }
 
   SemaRef.PopDeclContext();
@@ -1906,7 +1921,7 @@ static bool DiagnoseLocalRegisterBinding(Sema &S, 
SourceLocation &ArgLoc,
   if (const HLSLAttributedResourceType *AttrResType =
   HLSLAttributedResourceType::findHandleTypeOnResource(
   VD->getType().getTypePtr())) {
-if (RegType == getRegisterType(AttrResType->getAttrs().ResourceClass))
+if (RegType == getRegisterType(AttrResType))
   return true;
 
 S.Diag(D->getLocation(), diag::err_hlsl_binding_type_mismatch)
@@ -2439,8 +2454,8 @@ void 
SemaHLSL::ActOnEndOfTranslationUnit(TranslationUnitDecl *TU) {
 HLSLBufferDecl *DefaultCBuffer = HLSLBufferDecl::CreateDefaultCBuffer(
 SemaRef.getASTContext(), SemaRef.getCurLexicalContext(),
 DefaultCBufferDecls);
-addImplicitBindingAttrToBuffer(SemaRef, DefaultCBuffer,
-   getNextImplicitBindingOrderID());
+addImplicitBindingAttrToDecl(SemaRef, DefaultCBuffer, 
RegisterType::CBuffer,
+ getNextImplicitBindingOrder

[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)

2025-08-07 Thread Sam Tebbs via llvm-branch-commits


@@ -974,6 +974,11 @@ AArch64TTIImpl::getIntrinsicInstrCost(const 
IntrinsicCostAttributes &ICA,
 }
 break;
   }
+  case Intrinsic::loop_dependence_raw_mask:
+  case Intrinsic::loop_dependence_war_mask:
+if (ST->hasSVE2())
+  return 1;
+return InstructionCost::getInvalid(CostKind);

SamTebbs33 wrote:

The intrinsics do expand into a [lot of 
instructions](https://github.com/llvm/llvm-project/pull/117007/files#diff-d7065626b3d269e24241429ce037d51fc91d5ead5896d67fcc038aefcfd2R1806),
 so I'm keen to hear people's opinions on whether invalid is better than 
calculating the cost of them, since that will probably be very high.

https://github.com/llvm/llvm-project/pull/100579
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL] Global resource arrays element access (PR #152454)

2025-08-07 Thread Helena Kotas via llvm-branch-commits

https://github.com/hekota created 
https://github.com/llvm/llvm-project/pull/152454

Adds support for accessing individual resources from fixed-size resource arrays 
declared at global scope. When a global resource array is indexed to retrieve a 
specific resource, the codegen translates the `ArraySubscriptExpr` AST node to 
a constructor call for the corresponding resource record type and binding.

Closes #145424

>From 86902233a96b26b710bd39c096cb581f252e09a4 Mon Sep 17 00:00:00 2001
From: Helena Kotas 
Date: Thu, 7 Aug 2025 01:30:36 -0700
Subject: [PATCH] [HLSL] Global resource arrays element access

Adds support for accessing individual resources from fixed-size resource
arrays declared at global scope. When a global resource array is indexed
 to retrieve a specific resource, the codegen translates the 
`ArraySubscriptExpr`
into a constructor call for the corresponding resource record type and binding.

Closes #145424
---
 clang/include/clang/Sema/SemaHLSL.h   |   9 +-
 clang/lib/CodeGen/CGExpr.cpp  |  10 +
 clang/lib/CodeGen/CGHLSLRuntime.cpp   | 223 +-
 clang/lib/CodeGen/CGHLSLRuntime.h |   6 +
 clang/lib/CodeGen/CodeGenModule.cpp   |   4 +-
 clang/lib/Sema/SemaHLSL.cpp   |  93 ++--
 .../resources/res-array-global-multi-dim.hlsl |  32 +++
 .../resources/res-array-global.hlsl   |  59 +
 clang/test/CodeGenHLSL/static-local-ctor.hlsl |   5 +-
 9 files changed, 401 insertions(+), 40 deletions(-)
 create mode 100644 
clang/test/CodeGenHLSL/resources/res-array-global-multi-dim.hlsl
 create mode 100644 clang/test/CodeGenHLSL/resources/res-array-global.hlsl

diff --git a/clang/include/clang/Sema/SemaHLSL.h 
b/clang/include/clang/Sema/SemaHLSL.h
index 085c9ed9f3ebd..0c215c6e10013 100644
--- a/clang/include/clang/Sema/SemaHLSL.h
+++ b/clang/include/clang/Sema/SemaHLSL.h
@@ -229,10 +229,17 @@ class SemaHLSL : public SemaBase {
 
   void diagnoseAvailabilityViolations(TranslationUnitDecl *TU);
 
-  bool initGlobalResourceDecl(VarDecl *VD);
   uint32_t getNextImplicitBindingOrderID() {
 return ImplicitBindingNextOrderID++;
   }
+
+  bool initGlobalResourceDecl(VarDecl *VD);
+  bool initGlobalResourceArrayDecl(VarDecl *VD);
+  void createResourceRecordCtorArgs(const Type *ResourceTy, StringRef VarName,
+HLSLResourceBindingAttr *RBA,
+HLSLVkBindingAttr *VkBinding,
+uint32_t ArrayIndex,
+llvm::SmallVector &Args);
 };
 
 } // namespace clang
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index ed35a055d8a7f..8c34fb501a3b8 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -16,6 +16,7 @@
 #include "CGCall.h"
 #include "CGCleanup.h"
 #include "CGDebugInfo.h"
+#include "CGHLSLRuntime.h"
 #include "CGObjCRuntime.h"
 #include "CGOpenMPRuntime.h"
 #include "CGRecordLayout.h"
@@ -4532,6 +4533,15 @@ LValue CodeGenFunction::EmitArraySubscriptExpr(const 
ArraySubscriptExpr *E,
  LHS.getBaseInfo(), TBAAAccessInfo());
   }
 
+  // The HLSL runtime handle the subscript expression on global resource 
arrays.
+  if (getLangOpts().HLSL && (E->getType()->isHLSLResourceRecord() ||
+ E->getType()->isHLSLResourceRecordArray())) {
+std::optional LV =
+CGM.getHLSLRuntime().emitResourceArraySubscriptExpr(E, *this);
+if (LV.has_value())
+  return *LV;
+  }
+
   // All the other cases basically behave like simple offsetting.
 
   // Handle the extvector case we ignored above.
diff --git a/clang/lib/CodeGen/CGHLSLRuntime.cpp 
b/clang/lib/CodeGen/CGHLSLRuntime.cpp
index 918cb3e38448d..a09e540367a18 100644
--- a/clang/lib/CodeGen/CGHLSLRuntime.cpp
+++ b/clang/lib/CodeGen/CGHLSLRuntime.cpp
@@ -84,6 +84,124 @@ void addRootSignature(llvm::dxbc::RootSignatureVersion 
RootSigVer,
   RootSignatureValMD->addOperand(MDVals);
 }
 
+// If the specified expr is a simple decay from an array to pointer,
+// return the array subexpression. Otherwise, return nullptr.
+static const Expr *getSubExprFromArrayDecayOperand(const Expr *E) {
+  const auto *CE = dyn_cast(E);
+  if (!CE || CE->getCastKind() != CK_ArrayToPointerDecay)
+return nullptr;
+  return CE->getSubExpr();
+}
+
+// Find array variable declaration from nested array subscript AST nodes
+static const ValueDecl *getArrayDecl(const ArraySubscriptExpr *ASE) {
+  const Expr *E = nullptr;
+  while (ASE != nullptr) {
+E = getSubExprFromArrayDecayOperand(ASE->getBase());
+if (!E)
+  return nullptr;
+ASE = dyn_cast(E);
+  }
+  if (const DeclRefExpr *DRE = dyn_cast_or_null(E))
+return DRE->getDecl();
+  return nullptr;
+}
+
+// Get the total size of the array, or -1 if the array is unbounded.
+static int getTotalArraySize(const clang::Type *Ty) {
+  assert(Ty->isArrayType() && "expected array type");
+  if (

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in various special cases (PR #145330)

2025-08-07 Thread Fabian Ritter via llvm-branch-commits

https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/145330

>From ec5c4d315a4611383838d8b6d517dfb5a5de7806 Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Tue, 17 Jun 2025 04:03:53 -0400
Subject: [PATCH 1/2] [AMDGPU][SDAG] Handle ISD::PTRADD in various special
 cases

There are more places in SIISelLowering.cpp and AMDGPUISelDAGToDAG.cpp
that check for ISD::ADD in a pointer context, but as far as I can tell
those are only relevant for 32-bit pointer arithmetic (like frame
indices/scratch addresses and LDS), for which we don't enable PTRADD
generation yet.

For SWDEV-516125.
---
 .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp |   2 +-
 .../CodeGen/SelectionDAG/TargetLowering.cpp   |  21 +-
 llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp |   6 +-
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |   7 +-
 llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll |  67 ++
 .../AMDGPU/ptradd-sdag-optimizations.ll   | 196 ++
 6 files changed, 105 insertions(+), 194 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 649a3107cc21c..e908c50b6caed 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -8389,7 +8389,7 @@ static bool isMemSrcFromConstant(SDValue Src, 
ConstantDataArraySlice &Slice) {
   GlobalAddressSDNode *G = nullptr;
   if (Src.getOpcode() == ISD::GlobalAddress)
 G = cast(Src);
-  else if (Src.getOpcode() == ISD::ADD &&
+  else if (Src->isAnyAdd() &&
Src.getOperand(0).getOpcode() == ISD::GlobalAddress &&
Src.getOperand(1).getOpcode() == ISD::Constant) {
 G = cast(Src.getOperand(0));
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp 
b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index e235d144e85ff..6010ce78cf4d9 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -632,8 +632,14 @@ bool TargetLowering::ShrinkDemandedOp(SDValue Op, unsigned 
BitWidth,
   // operands on the new node are also disjoint.
   SDNodeFlags Flags(Op->getFlags().hasDisjoint() ? SDNodeFlags::Disjoint
  : SDNodeFlags::None);
+  unsigned Opcode = Op.getOpcode();
+  if (Opcode == ISD::PTRADD) {
+// It isn't a ptradd anymore if it doesn't operate on the entire
+// pointer.
+Opcode = ISD::ADD;
+  }
   SDValue X = DAG.getNode(
-  Op.getOpcode(), dl, SmallVT,
+  Opcode, dl, SmallVT,
   DAG.getNode(ISD::TRUNCATE, dl, SmallVT, Op.getOperand(0)),
   DAG.getNode(ISD::TRUNCATE, dl, SmallVT, Op.getOperand(1)), Flags);
   assert(DemandedSize <= SmallVTBits && "Narrowed below demanded bits?");
@@ -2861,6 +2867,11 @@ bool TargetLowering::SimplifyDemandedBits(
   return TLO.CombineTo(Op, And1);
 }
 [[fallthrough]];
+  case ISD::PTRADD:
+if (Op.getOperand(0).getValueType() != Op.getOperand(1).getValueType())
+  break;
+// PTRADD behaves like ADD if pointers are represented as integers.
+[[fallthrough]];
   case ISD::ADD:
   case ISD::SUB: {
 // Add, Sub, and Mul don't demand any bits in positions beyond that
@@ -2970,10 +2981,10 @@ bool TargetLowering::SimplifyDemandedBits(
 
 if (Op.getOpcode() == ISD::MUL) {
   Known = KnownBits::mul(KnownOp0, KnownOp1);
-} else { // Op.getOpcode() is either ISD::ADD or ISD::SUB.
+} else { // Op.getOpcode() is either ISD::ADD, ISD::PTRADD, or ISD::SUB.
   Known = KnownBits::computeForAddSub(
-  Op.getOpcode() == ISD::ADD, Flags.hasNoSignedWrap(),
-  Flags.hasNoUnsignedWrap(), KnownOp0, KnownOp1);
+  Op->isAnyAdd(), Flags.hasNoSignedWrap(), Flags.hasNoUnsignedWrap(),
+  KnownOp0, KnownOp1);
 }
 break;
   }
@@ -5696,7 +5707,7 @@ bool TargetLowering::isGAPlusOffset(SDNode *WN, const 
GlobalValue *&GA,
 return true;
   }
 
-  if (N->getOpcode() == ISD::ADD) {
+  if (N->isAnyAdd()) {
 SDValue N1 = N->getOperand(0);
 SDValue N2 = N->getOperand(1);
 if (isGAPlusOffset(N1.getNode(), GA, Offset)) {
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
index fb83388e5e265..aea1b9461da89 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
@@ -1489,7 +1489,7 @@ bool AMDGPUDAGToDAGISel::SelectMUBUF(SDValue Addr, 
SDValue &Ptr, SDValue &VAddr,
   C1 = nullptr;
   }
 
-  if (N0.getOpcode() == ISD::ADD) {
+  if (N0->isAnyAdd()) {
 // (add N2, N3) -> addr64, or
 // (add (add N2, N3), C1) -> addr64
 SDValue N2 = N0.getOperand(0);
@@ -1951,7 +1951,7 @@ bool AMDGPUDAGToDAGISel::SelectGlobalSAddr(SDNode *N, 
SDValue Addr,
   }
 
   // Match the variable offset.
-  if (Addr.getOpcode() == ISD::ADD) {
+  if (Addr->isAnyAdd()) {
 LHS = Addr.getOperand(0);
 
 if (!LHS

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Test ISD::PTRADD handling in various special cases (PR #145329)

2025-08-07 Thread Fabian Ritter via llvm-branch-commits

https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/145329

>From b4212e94fbf40d8b9bebdb346f7aee103f5d561e Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Tue, 17 Jun 2025 03:51:19 -0400
Subject: [PATCH] [AMDGPU][SDAG] Test ISD::PTRADD handling in various special
 cases

Pre-committing tests to show improvements in a follow-up PR.
---
 llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll |  63 ++
 .../AMDGPU/ptradd-sdag-optimizations.ll   | 206 ++
 2 files changed, 269 insertions(+)
 create mode 100644 llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll

diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll 
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll
new file mode 100644
index 0..fab56383ffa8a
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll
@@ -0,0 +1,63 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=tahiti -amdgpu-use-sdag-ptradd=1 < 
%s | FileCheck --check-prefixes=GFX6,GFX6_PTRADD %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=tahiti -amdgpu-use-sdag-ptradd=0 < 
%s | FileCheck --check-prefixes=GFX6,GFX6_LEGACY %s
+
+; Test PTRADD handling in AMDGPUDAGToDAGISel::SelectMUBUF.
+
+define amdgpu_kernel void @v_add_i32(ptr addrspace(1) %out, ptr addrspace(1) 
%in) {
+; GFX6_PTRADD-LABEL: v_add_i32:
+; GFX6_PTRADD:   ; %bb.0:
+; GFX6_PTRADD-NEXT:s_load_dwordx4 s[0:3], s[8:9], 0x0
+; GFX6_PTRADD-NEXT:v_lshlrev_b32_e32 v0, 2, v0
+; GFX6_PTRADD-NEXT:s_mov_b32 s7, 0x100f000
+; GFX6_PTRADD-NEXT:s_mov_b32 s10, 0
+; GFX6_PTRADD-NEXT:s_mov_b32 s11, s7
+; GFX6_PTRADD-NEXT:s_waitcnt lgkmcnt(0)
+; GFX6_PTRADD-NEXT:v_mov_b32_e32 v1, s3
+; GFX6_PTRADD-NEXT:v_add_i32_e32 v0, vcc, s2, v0
+; GFX6_PTRADD-NEXT:v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; GFX6_PTRADD-NEXT:s_mov_b32 s8, s10
+; GFX6_PTRADD-NEXT:s_mov_b32 s9, s10
+; GFX6_PTRADD-NEXT:buffer_load_dword v2, v[0:1], s[8:11], 0 addr64 glc
+; GFX6_PTRADD-NEXT:s_waitcnt vmcnt(0)
+; GFX6_PTRADD-NEXT:buffer_load_dword v0, v[0:1], s[8:11], 0 addr64 
offset:4 glc
+; GFX6_PTRADD-NEXT:s_waitcnt vmcnt(0)
+; GFX6_PTRADD-NEXT:s_mov_b32 s6, -1
+; GFX6_PTRADD-NEXT:s_mov_b32 s4, s0
+; GFX6_PTRADD-NEXT:s_mov_b32 s5, s1
+; GFX6_PTRADD-NEXT:v_add_i32_e32 v0, vcc, v2, v0
+; GFX6_PTRADD-NEXT:buffer_store_dword v0, off, s[4:7], 0
+; GFX6_PTRADD-NEXT:s_endpgm
+;
+; GFX6_LEGACY-LABEL: v_add_i32:
+; GFX6_LEGACY:   ; %bb.0:
+; GFX6_LEGACY-NEXT:s_load_dwordx4 s[0:3], s[8:9], 0x0
+; GFX6_LEGACY-NEXT:s_mov_b32 s7, 0x100f000
+; GFX6_LEGACY-NEXT:s_mov_b32 s10, 0
+; GFX6_LEGACY-NEXT:s_mov_b32 s11, s7
+; GFX6_LEGACY-NEXT:v_lshlrev_b32_e32 v0, 2, v0
+; GFX6_LEGACY-NEXT:s_waitcnt lgkmcnt(0)
+; GFX6_LEGACY-NEXT:s_mov_b64 s[8:9], s[2:3]
+; GFX6_LEGACY-NEXT:v_mov_b32_e32 v1, 0
+; GFX6_LEGACY-NEXT:buffer_load_dword v2, v[0:1], s[8:11], 0 addr64 glc
+; GFX6_LEGACY-NEXT:s_waitcnt vmcnt(0)
+; GFX6_LEGACY-NEXT:buffer_load_dword v0, v[0:1], s[8:11], 0 addr64 
offset:4 glc
+; GFX6_LEGACY-NEXT:s_waitcnt vmcnt(0)
+; GFX6_LEGACY-NEXT:s_mov_b32 s6, -1
+; GFX6_LEGACY-NEXT:s_mov_b32 s4, s0
+; GFX6_LEGACY-NEXT:s_mov_b32 s5, s1
+; GFX6_LEGACY-NEXT:v_add_i32_e32 v0, vcc, v2, v0
+; GFX6_LEGACY-NEXT:buffer_store_dword v0, off, s[4:7], 0
+; GFX6_LEGACY-NEXT:s_endpgm
+  %tid = call i32 @llvm.amdgcn.workitem.id.x()
+  %gep = getelementptr inbounds i32, ptr addrspace(1) %in, i32 %tid
+  %b_ptr = getelementptr i32, ptr addrspace(1) %gep, i32 1
+  %a = load volatile i32, ptr addrspace(1) %gep
+  %b = load volatile i32, ptr addrspace(1) %b_ptr
+  %result = add i32 %a, %b
+  store i32 %result, ptr addrspace(1) %out
+  ret void
+}
+
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add 
tests below this line:
+; GFX6: {{.*}}
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll 
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index b7bfc5a7c..1a54ba716a80a 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -291,3 +291,209 @@ define ptr @fold_mul24_into_mad(ptr %base, i64 %a, i64 
%b) {
   %gep = getelementptr inbounds i8, ptr %base, i64 %mul
   ret ptr %gep
 }
+
+; Test PTRADD handling in AMDGPUDAGToDAGISel::SelectGlobalSAddr.
+define amdgpu_kernel void @uniform_base_varying_offset_imm(ptr addrspace(1) 
%p) {
+; GFX942_PTRADD-LABEL: uniform_base_varying_offset_imm:
+; GFX942_PTRADD:   ; %bb.0: ; %entry
+; GFX942_PTRADD-NEXT:s_load_dwordx2 s[0:1], s[4:5], 0x0
+; GFX942_PTRADD-NEXT:v_and_b32_e32 v0, 0x3ff, v0
+; GFX942_PTRADD-NEXT:v_mov_b32_e32 v1, 0
+; GFX942_PTRADD-NEXT:v_lshlrev_b32_e32 v0, 2, v0
+; GFX942_PTRADD-NEXT:v_mov_b32_e32 v2, 1
+; GFX942_PTRADD-NEXT:s_waitcnt lgkmcnt(0)
+; GFX942_PTRADD-NEXT:v_lshl_add_u64 v[0:1], s[0:1], 0, v[0:1]
+; GFX942_PTRAD

[llvm-branch-commits] [llvm] release/21.x: [flang-rt] Use correct flang-rt build for flang-rt unit tests on Windows (#152318) (PR #152493)

2025-08-07 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/152493

Backport f73a302

Requested by: @DavidTruby

>From 332baaaee9815118a44982c1efd1dc14dc16ae6c Mon Sep 17 00:00:00 2001
From: David Truby 
Date: Thu, 7 Aug 2025 13:09:35 +0100
Subject: [PATCH] [flang-rt] Use correct flang-rt build for flang-rt unit tests
 on Windows (#152318)

Currrently flang-rt assumes that LLVM was always built with the dynamic
MSVC runtime. This may not be the case, if the user has specified a
different runtime with -DCMAKE_MSVC_RUNTIME_LIBRARY. Since this flag is
implied by -DLLVM_ENABLE_RPMALLOC=On, which is used by the Windows
release script, this is causing that script to fail.

Fixes #151920

(cherry picked from commit f73a3028c2d46928280d69d9e953ff79d2eb0fbb)
---
 flang-rt/lib/runtime/CMakeLists.txt | 32 +
 flang-rt/unittests/CMakeLists.txt   |  8 
 2 files changed, 23 insertions(+), 17 deletions(-)

diff --git a/flang-rt/lib/runtime/CMakeLists.txt 
b/flang-rt/lib/runtime/CMakeLists.txt
index 332c0872e065f..dc2db1d9902cb 100644
--- a/flang-rt/lib/runtime/CMakeLists.txt
+++ b/flang-rt/lib/runtime/CMakeLists.txt
@@ -251,19 +251,33 @@ else()
   add_win_flangrt_runtime(STATIC dynamic MultiThreadedDLL  
INSTALL_WITH_TOOLCHAIN)
   add_win_flangrt_runtime(STATIC dynamic_dbg MultiThreadedDebugDLL 
INSTALL_WITH_TOOLCHAIN)
 
-  # Unittests link against LLVMSupport which is using CMake's default runtime
-  # library selection, which is either MultiThreadedDLL or 
MultiThreadedDebugDLL
-  # depending on the configuration. They have to match or linking will fail.
+  # Unittests link against LLVMSupport. If CMAKE_MSVC_RUNTIME_LIBRARY is set,
+  # that will have been used for LLVMSupport so it must also be used here.
+  # Otherwise this will use CMake's default runtime library selection, which
+  # is either MultiThreadedDLL or MultiThreadedDebugDLL depending on the 
configuration.
+  # They have to match or linking will fail.
   if (GENERATOR_IS_MULTI_CONFIG)
 # We cannot select an ALIAS library because it may be different
 # per configuration. Fallback to CMake's default.
 add_win_flangrt_runtime(STATIC unittest "" EXCLUDE_FROM_ALL)
   else ()
-string(TOLOWER ${CMAKE_BUILD_TYPE} build_type)
-if (build_type STREQUAL "debug")
-  add_library(flang_rt.runtime.unittest ALIAS flang_rt.runtime.dynamic_dbg)
-else ()
-  add_library(flang_rt.runtime.unittest ALIAS flang_rt.runtime.dynamic)
-endif ()
+# Check if CMAKE_MSVC_RUNTIME_LIBRARY was set.
+if (CMAKE_MSVC_RUNTIME_LIBRARY STREQUAL "MultiThreaded")
+add_library(flang_rt.runtime.unittest ALIAS flang_rt.runtime.static)
+elseif (CMAKE_MSVC_RUNTIME_LIBRARY STREQUAL "MultiThreadedDLL")
+add_library(flang_rt.runtime.unittest ALIAS flang_rt.runtime.dynamic)
+elseif (CMAKE_MSVC_RUNTIME_LIBRARY STREQUAL "MultiThreadedDebug")
+add_library(flang_rt.runtime.unittest ALIAS 
flang_rt.runtime.static_dbg)
+elseif (CMAKE_MSVC_RUNTIME_LIBRARY STREQUAL "MultiThreadedDebugDLL")
+add_library(flang_rt.runtime.unittest ALIAS 
flang_rt.runtime.dynamic_dbg)
+else()
+  # Default based on the build type.
+  string(TOLOWER ${CMAKE_BUILD_TYPE} build_type)
+  if (build_type STREQUAL "debug")
+  add_library(flang_rt.runtime.unittest ALIAS 
flang_rt.runtime.dynamic_dbg)
+  else ()
+  add_library(flang_rt.runtime.unittest ALIAS flang_rt.runtime.dynamic)
+  endif ()
+endif()
   endif ()
 endif()
diff --git a/flang-rt/unittests/CMakeLists.txt 
b/flang-rt/unittests/CMakeLists.txt
index 831bc8a4c2906..fd63ad11dcf43 100644
--- a/flang-rt/unittests/CMakeLists.txt
+++ b/flang-rt/unittests/CMakeLists.txt
@@ -94,14 +94,6 @@ function(add_flangrt_unittest test_dirname)
   target_link_libraries(${test_dirname} PRIVATE ${ARG_LINK_LIBS})
   add_flangrt_unittest_offload_properties(${test_dirname})
   add_flangrt_dependent_libs(${test_dirname})
-
-  # Required because LLVMSupport is compiled with this option.
-  # FIXME: According to CMake documentation, this is the default. Why is it
-  #needed? LLVM's add_unittest doesn't set it either.
-  set_target_properties(${test_dirname}
-  PROPERTIES
-MSVC_RUNTIME_LIBRARY "MultiThreaded$<$:Debug>DLL"
-)
 endfunction()
 
 function(add_flangrt_nongtest_unittest test_name)

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/21.x: [flang-rt] Use correct flang-rt build for flang-rt unit tests on Windows (#152318) (PR #152493)

2025-08-07 Thread via llvm-branch-commits

llvmbot wrote:

@Meinersbur What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/152493
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/21.x: [flang-rt] Use correct flang-rt build for flang-rt unit tests on Windows (#152318) (PR #152493)

2025-08-07 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/152493
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add a few missing mfma rewrite tests (PR #149026)

2025-08-07 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm closed 
https://github.com/llvm/llvm-project/pull/149026
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL] Add implicit binding attribute to resource arrays (PR #152452)

2025-08-07 Thread Helena Kotas via llvm-branch-commits

https://github.com/hekota edited 
https://github.com/llvm/llvm-project/pull/152452
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL] Add implicit binding attribute to resource arrays (PR #152452)

2025-08-07 Thread Helena Kotas via llvm-branch-commits

https://github.com/hekota edited 
https://github.com/llvm/llvm-project/pull/152452
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [MLIR][OpenMP] Add a new AutomapToTargetData conversion pass in FIR (PR #151989)

2025-08-07 Thread Akash Banerjee via llvm-branch-commits


@@ -0,0 +1,171 @@
+//===- AutomapToTargetData.cpp ---===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "flang/Optimizer/Builder/DirectivesCommon.h"
+#include "flang/Optimizer/Builder/FIRBuilder.h"
+#include "flang/Optimizer/Builder/HLFIRTools.h"
+#include "flang/Optimizer/Dialect/FIROps.h"
+#include "flang/Optimizer/Dialect/FIRType.h"
+#include "flang/Optimizer/Dialect/Support/KindMapping.h"
+#include "flang/Optimizer/HLFIR/HLFIROps.h"
+#include "mlir/IR/BuiltinAttributes.h"
+#include "mlir/Pass/Pass.h"
+#include "llvm/Frontend/OpenMP/OMPConstants.h"
+#include 
+#include 
+
+namespace flangomp {
+#define GEN_PASS_DEF_AUTOMAPTOTARGETDATAPASS
+#include "flang/Optimizer/OpenMP/Passes.h.inc"
+} // namespace flangomp
+
+using namespace mlir;
+
+namespace {
+class AutomapToTargetDataPass
+: public flangomp::impl::AutomapToTargetDataPassBase<
+  AutomapToTargetDataPass> {
+  // Returns true if the variable has a dynamic size and therefore requires
+  // bounds operations to describe its extents.
+  bool needsBoundsOps(Value var) {

TIFitis wrote:

I've moved both as static functions to 
_flang/include/flang/Support/OpenMP-utils.h_. Let me know if that's alright.

https://github.com/llvm/llvm-project/pull/151989
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Adjust hard clause rules for gfx1250 (PR #152592)

2025-08-07 Thread Stanislav Mekhanoshin via llvm-branch-commits

rampitec wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/152592?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#152592** https://app.graphite.dev/github/pr/llvm/llvm-project/152592?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/152592?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#152584** https://app.graphite.dev/github/pr/llvm/llvm-project/152584?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/152592
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Adjust hard clause rules for gfx1250 (PR #152592)

2025-08-07 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec ready_for_review 
https://github.com/llvm/llvm-project/pull/152592
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL] Add implicit binding attribute to resource arrays (PR #152452)

2025-08-07 Thread Helena Kotas via llvm-branch-commits


@@ -342,6 +346,17 @@ static bool isResourceRecordTypeOrArrayOf(VarDecl *VD) {
   return Ty->isHLSLResourceRecord() || Ty->isHLSLResourceRecordArray();
 }
 
+static const HLSLAttributedResourceType *
+getResourceArrayHandleType(VarDecl *VD) {
+  assert(VD->getType()->isHLSLResourceRecordArray() &&
+ "expected array of resource records");
+  const Type *Ty = VD->getType()->getUnqualifiedDesugaredType();
+  while (const ConstantArrayType *CAT = dyn_cast(Ty)) {
+Ty = CAT->getArrayElementTypeNoTypeQual()->getUnqualifiedDesugaredType();
+  }

hekota wrote:

It is grabbing the array element type (=the actual resource type). 
Multi-dimensional arrays are represented by nested `ConstantArrayType`s 
instances, so to get to the element type the array type needs to be "unwrapped" 
in a loop.

https://github.com/llvm/llvm-project/pull/152452
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/21.x: [clang][PAC] Fix PAC codegen for final class dynamic_cast optimization (#152227) (PR #152586)

2025-08-07 Thread Anton Korobeynikov via llvm-branch-commits

https://github.com/asl approved this pull request.


https://github.com/llvm/llvm-project/pull/152586
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [ir] MD_prof is not UB-implying (PR #152420)

2025-08-07 Thread Mircea Trofin via llvm-branch-commits

https://github.com/mtrofin updated 
https://github.com/llvm/llvm-project/pull/152420

>From f0cf2e9a7ad9b45a6270c727b60e4cd15ea57d27 Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Wed, 6 Aug 2025 17:43:35 -0700
Subject: [PATCH] [ir] MD_prof is not UB-implying

---
 llvm/lib/IR/Metadata.cpp  |  4 ++
 .../Transforms/LICM/hoist-phi-metadata.ll | 46 +++
 2 files changed, 50 insertions(+)

diff --git a/llvm/lib/IR/Metadata.cpp b/llvm/lib/IR/Metadata.cpp
index 1157cbe6bbc1b..ba838cd2793ce 100644
--- a/llvm/lib/IR/Metadata.cpp
+++ b/llvm/lib/IR/Metadata.cpp
@@ -57,6 +57,8 @@
 
 using namespace llvm;
 
+extern cl::opt ProfcheckDisableMetadataFixes;
+
 MetadataAsValue::MetadataAsValue(Type *Ty, Metadata *MD)
 : Value(Ty, MetadataAsValueVal), MD(MD) {
   track();
@@ -1678,6 +1680,8 @@ void 
Instruction::dropUnknownNonDebugMetadata(ArrayRef KnownIDs) {
 
   // A DIAssignID attachment is debug metadata, don't drop it.
   KnownSet.insert(LLVMContext::MD_DIAssignID);
+  if (!ProfcheckDisableMetadataFixes)
+KnownSet.insert(LLVMContext::MD_prof);
 
   Value::eraseMetadataIf([&KnownSet](unsigned MDKind, MDNode *Node) {
 return !KnownSet.count(MDKind);
diff --git a/llvm/test/Transforms/LICM/hoist-phi-metadata.ll 
b/llvm/test/Transforms/LICM/hoist-phi-metadata.ll
index e98de9c79ea8c..6034d12d931c2 100644
--- a/llvm/test/Transforms/LICM/hoist-phi-metadata.ll
+++ b/llvm/test/Transforms/LICM/hoist-phi-metadata.ll
@@ -45,6 +45,46 @@ end:
   ret void
 }
 
+declare i32 @getv()
+
+; indirect.goto.dest2 should get hoisted, and that should not result
+; in a loss of profiling info
+define i32 @test19(i1 %cond, i1 %cond2, ptr %address, i32 %v1) nounwind {
+; CHECK-LABEL: define i32 @test19
+; CHECK-SAME: (i1 [[COND:%.*]], i1 [[COND2:%.*]], ptr [[ADDRESS:%.*]], i32 
[[V1:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[INDIRECT_GOTO_DEST:%.*]] = select i1 [[COND]], ptr 
blockaddress(@test19, [[EXIT:%.*]]), ptr [[ADDRESS]], !prof [[PROF9:![0-9]+]]
+; CHECK-NEXT:[[INDIRECT_GOTO_DEST2:%.*]] = select i1 [[COND2]], ptr 
blockaddress(@test19, [[EXIT]]), ptr [[ADDRESS]], !prof [[PROF10:![0-9]+]]
+; CHECK-NEXT:br label [[L0:%.*]]
+; CHECK:   L0:
+; CHECK-NEXT:[[V2:%.*]] = call i32 @getv()
+; CHECK-NEXT:[[SINKABLE:%.*]] = mul i32 [[V1]], [[V2]]
+; CHECK-NEXT:[[SINKABLE2:%.*]] = add i32 [[V1]], [[V2]]
+; CHECK-NEXT:indirectbr ptr [[INDIRECT_GOTO_DEST]], [label [[L1:%.*]], 
label %exit]
+; CHECK:   L1:
+; CHECK-NEXT:indirectbr ptr [[INDIRECT_GOTO_DEST2]], [label [[L0]], label 
%exit]
+; CHECK:   exit:
+; CHECK-NEXT:[[R:%.*]] = phi i32 [ [[SINKABLE]], [[L0]] ], [ 
[[SINKABLE2]], [[L1]] ]
+; CHECK-NEXT:ret i32 [[R]]
+;
+entry:
+  br label %L0
+L0:
+  %indirect.goto.dest = select i1 %cond, ptr blockaddress(@test19, %exit), ptr 
%address, !prof !10
+  %v2 = call i32 @getv()
+  %sinkable = mul i32 %v1, %v2
+  %sinkable2 = add i32 %v1, %v2
+  indirectbr ptr %indirect.goto.dest, [label %L1, label %exit]
+
+L1:
+  %indirect.goto.dest2 = select i1 %cond2, ptr blockaddress(@test19, %exit), 
ptr %address, !prof !11
+  indirectbr ptr %indirect.goto.dest2, [label %L0, label %exit]
+
+exit:
+  %r = phi i32 [%sinkable, %L0], [%sinkable2, %L1]
+  ret i32 %r
+}
+
 !llvm.module.flags = !{!2, !3}
 
 !0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !1)
@@ -57,6 +97,10 @@ end:
 !7 = !DILocation(line: 3, column: 22, scope: !4)
 !8 = !{!"branch_weights", i32 5, i32 7}
 !9 = !{!"branch_weights", i32 13, i32 11}
+!10 = !{!"branch_weights", i32 101, i32 189}
+!11 = !{!"branch_weights", i32 67, i32 1}
+;.
+; CHECK: attributes #[[ATTR0]] = { nounwind }
 ;.
 ; CHECK: [[META0:![0-9]+]] = !{i32 7, !"Dwarf Version", i32 5}
 ; CHECK: [[META1:![0-9]+]] = !{i32 2, !"Debug Info Version", i32 3}
@@ -67,4 +111,6 @@ end:
 ; CHECK: [[PROF6]] = !{!"branch_weights", i32 5, i32 7}
 ; CHECK: [[DBG7]] = !DILocation(line: 3, column: 22, scope: [[META3]])
 ; CHECK: [[PROF8]] = !{!"branch_weights", i32 13, i32 11}
+; CHECK: [[PROF9]] = !{!"branch_weights", i32 101, i32 189}
+; CHECK: [[PROF10]] = !{!"branch_weights", i32 67, i32 1}
 ;.

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL] Add implicit binding attribute to resource arrays (PR #152452)

2025-08-07 Thread Helena Kotas via llvm-branch-commits


@@ -34,6 +34,10 @@ RWBuffer UAV1 : register(u2), UAV2 : register(u4);
 // CHECK: HLSLResourceBindingAttr {{.*}} "" "space5"
 RWBuffer UAV3 : register(space5);
 
+// CHECK: VarDecl {{.*}} UAV_Array 'RWBuffer[10]'

hekota wrote:

Will do.

https://github.com/llvm/llvm-project/pull/152452
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Enable CodeGen for v_pk_fma_bf16 (PR #152578)

2025-08-07 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Stanislav Mekhanoshin (rampitec)


Changes



---

Patch is 80.64 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/152578.diff


3 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+1) 
- (modified) llvm/test/CodeGen/AMDGPU/bf16-math.ll (+29-14) 
- (modified) llvm/test/CodeGen/AMDGPU/bf16.ll (+362-777) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 8f44c03d95b43..fd1be72ce6d82 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -6106,6 +6106,7 @@ bool SITargetLowering::isFMAFasterThanFMulAndFAdd(const 
MachineFunction &MF,
   case MVT::f64:
 return true;
   case MVT::f16:
+  case MVT::bf16:
 return Subtarget->has16BitInsts() && !denormalModeIsFlushAllF64F16(MF);
   default:
 break;
diff --git a/llvm/test/CodeGen/AMDGPU/bf16-math.ll 
b/llvm/test/CodeGen/AMDGPU/bf16-math.ll
index 682b3b4d57209..3a82f848f06a5 100644
--- a/llvm/test/CodeGen/AMDGPU/bf16-math.ll
+++ b/llvm/test/CodeGen/AMDGPU/bf16-math.ll
@@ -370,6 +370,9 @@ define amdgpu_ps bfloat @test_clamp_bf16_folding(bfloat 
%src) {
 ; GCN:   ; %bb.0:
 ; GCN-NEXT:v_exp_bf16_e64 v0, v0 clamp
 ; GCN-NEXT:; return to shader part epilog
+
+
+
   %exp = call bfloat @llvm.exp2.bf16(bfloat %src)
   %max = call bfloat @llvm.maxnum.bf16(bfloat %exp, bfloat 0.0)
   %clamp = call bfloat @llvm.minnum.bf16(bfloat %max, bfloat 1.0)
@@ -381,6 +384,9 @@ define amdgpu_ps float @test_clamp_v2bf16_folding(<2 x 
bfloat> %src0, <2 x bfloa
 ; GCN:   ; %bb.0:
 ; GCN-NEXT:v_pk_mul_bf16 v0, v0, v1 clamp
 ; GCN-NEXT:; return to shader part epilog
+
+
+
   %mul = fmul <2 x bfloat> %src0, %src1
   %max = call <2 x bfloat> @llvm.maxnum.v2bf16(<2 x bfloat> %mul, <2 x bfloat> 
)
   %clamp = call <2 x bfloat> @llvm.minnum.v2bf16(<2 x bfloat> %max, <2 x 
bfloat> )
@@ -391,11 +397,12 @@ define amdgpu_ps float @test_clamp_v2bf16_folding(<2 x 
bfloat> %src0, <2 x bfloa
 define amdgpu_ps void @v_test_mul_add_v2bf16_vvv(ptr addrspace(1) %out, <2 x 
bfloat> %a, <2 x bfloat> %b, <2 x bfloat> %c) {
 ; GCN-LABEL: v_test_mul_add_v2bf16_vvv:
 ; GCN:   ; %bb.0:
-; GCN-NEXT:v_pk_mul_bf16 v2, v2, v3
-; GCN-NEXT:s_delay_alu instid0(VALU_DEP_1)
-; GCN-NEXT:v_pk_add_bf16 v2, v2, v4
+; GCN-NEXT:v_pk_fma_bf16 v2, v2, v3, v4
 ; GCN-NEXT:global_store_b32 v[0:1], v2, off
 ; GCN-NEXT:s_endpgm
+
+
+
   %mul = fmul contract <2 x bfloat> %a, %b
   %add = fadd contract <2 x bfloat> %mul, %c
   store <2 x bfloat> %add, ptr addrspace(1) %out
@@ -405,11 +412,12 @@ define amdgpu_ps void @v_test_mul_add_v2bf16_vvv(ptr 
addrspace(1) %out, <2 x bfl
 define amdgpu_ps void @v_test_mul_add_v2bf16_vss(ptr addrspace(1) %out, <2 x 
bfloat> %a, <2 x bfloat> inreg %b, <2 x bfloat> inreg %c) {
 ; GCN-LABEL: v_test_mul_add_v2bf16_vss:
 ; GCN:   ; %bb.0:
-; GCN-NEXT:v_pk_mul_bf16 v2, v2, s0
-; GCN-NEXT:s_delay_alu instid0(VALU_DEP_1)
-; GCN-NEXT:v_pk_add_bf16 v2, v2, s1
+; GCN-NEXT:v_pk_fma_bf16 v2, v2, s0, s1
 ; GCN-NEXT:global_store_b32 v[0:1], v2, off
 ; GCN-NEXT:s_endpgm
+
+
+
   %mul = fmul contract <2 x bfloat> %a, %b
   %add = fadd contract <2 x bfloat> %mul, %c
   store <2 x bfloat> %add, ptr addrspace(1) %out
@@ -419,11 +427,14 @@ define amdgpu_ps void @v_test_mul_add_v2bf16_vss(ptr 
addrspace(1) %out, <2 x bfl
 define amdgpu_ps void @v_test_mul_add_v2bf16_sss(ptr addrspace(1) %out, <2 x 
bfloat> inreg %a, <2 x bfloat> inreg %b, <2 x bfloat> inreg %c) {
 ; GCN-LABEL: v_test_mul_add_v2bf16_sss:
 ; GCN:   ; %bb.0:
-; GCN-NEXT:v_pk_mul_bf16 v2, s0, s1
+; GCN-NEXT:v_mov_b32_e32 v2, s2
 ; GCN-NEXT:s_delay_alu instid0(VALU_DEP_1)
-; GCN-NEXT:v_pk_add_bf16 v2, v2, s2
+; GCN-NEXT:v_pk_fma_bf16 v2, s0, s1, v2
 ; GCN-NEXT:global_store_b32 v[0:1], v2, off
 ; GCN-NEXT:s_endpgm
+
+
+
   %mul = fmul contract <2 x bfloat> %a, %b
   %add = fadd contract <2 x bfloat> %mul, %c
   store <2 x bfloat> %add, ptr addrspace(1) %out
@@ -433,11 +444,12 @@ define amdgpu_ps void @v_test_mul_add_v2bf16_sss(ptr 
addrspace(1) %out, <2 x bfl
 define amdgpu_ps void @v_test_mul_add_v2bf16_vsc(ptr addrspace(1) %out, <2 x 
bfloat> %a, <2 x bfloat> inreg %b) {
 ; GCN-LABEL: v_test_mul_add_v2bf16_vsc:
 ; GCN:   ; %bb.0:
-; GCN-NEXT:v_pk_mul_bf16 v2, v2, s0
-; GCN-NEXT:s_delay_alu instid0(VALU_DEP_1)
-; GCN-NEXT:v_pk_add_bf16 v2, v2, 0.5 op_sel_hi:[1,0]
+; GCN-NEXT:v_pk_fma_bf16 v2, v2, s0, 0.5 op_sel_hi:[1,1,0]
 ; GCN-NEXT:global_store_b32 v[0:1], v2, off
 ; GCN-NEXT:s_endpgm
+
+
+
   %mul = fmul contract <2 x bfloat> %a, %b
   %add = fadd contract <2 x bfloat> %mul, 
   store <2 x bfloat> %add, ptr addrspace(1) %out
@@ -447,11 +459,14 @@ define amdgpu_ps void @v_test_mul_add_v2bf16_vsc(ptr 
addrspace(1) %out, <2 x bfl
 define amdgpu_ps void @v_test_mul_a

[llvm-branch-commits] [clang] release/21.x: [clang][PAC] Fix PAC codegen for final class dynamic_cast optimization (#152227) (PR #152587)

2025-08-07 Thread via llvm-branch-commits

llvmbot wrote:

@asl What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/152587
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/21.x: [clang][PAC] Fix PAC codegen for final class dynamic_cast optimization (#152227) (PR #152587)

2025-08-07 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/152587
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/21.x: [clang][PAC] Fix PAC codegen for final class dynamic_cast optimization (#152227) (PR #152587)

2025-08-07 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/152587

Backport 726847829553079a13b1b7104f2c2db9dcda9c1d

Requested by: @ojhunt

>From 9a524d13b390693d91742c4f8b7465a7963b0edf Mon Sep 17 00:00:00 2001
From: Oliver Hunt 
Date: Tue, 5 Aug 2025 17:41:55 -0700
Subject: [PATCH] [clang][PAC] Fix PAC codegen for final class dynamic_cast
 optimization (#152227)

The codegen for the final class dynamic_cast optimization fails to
consider pointer authentication. This change resolves this be simply
disabling the optimization when pointer authentication enabled.

(cherry picked from commit 726847829553079a13b1b7104f2c2db9dcda9c1d)
---
 clang/lib/CodeGen/CGExprCXX.cpp   | 3 ++-
 clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/clang/lib/CodeGen/CGExprCXX.cpp b/clang/lib/CodeGen/CGExprCXX.cpp
index 359e30cb8f5cd..912b1d72c7e23 100644
--- a/clang/lib/CodeGen/CGExprCXX.cpp
+++ b/clang/lib/CodeGen/CGExprCXX.cpp
@@ -2313,7 +2313,8 @@ llvm::Value *CodeGenFunction::EmitDynamicCast(Address 
ThisAddr,
   bool IsExact = !IsDynamicCastToVoid &&
  CGM.getCodeGenOpts().OptimizationLevel > 0 &&
  DestRecordTy->getAsCXXRecordDecl()->isEffectivelyFinal() &&
- CGM.getCXXABI().shouldEmitExactDynamicCast(DestRecordTy);
+ CGM.getCXXABI().shouldEmitExactDynamicCast(DestRecordTy) &&
+ !getLangOpts().PointerAuthCalls;
 
   // C++ [expr.dynamic.cast]p4:
   //   If the value of v is a null pointer value in the pointer case, the 
result
diff --git a/clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp 
b/clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp
index 9a8ce1997a7f9..19c2a9bd0497e 100644
--- a/clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp
+++ b/clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp
@@ -3,6 +3,7 @@
 // RUN: %clang_cc1 -I%S %s -triple x86_64-apple-darwin10 -O1 
-fvisibility=hidden -emit-llvm -std=c++11 -o - | FileCheck %s 
--check-prefixes=CHECK,INEXACT
 // RUN: %clang_cc1 -I%S %s -triple x86_64-apple-darwin10 -O1 -fapple-kext 
-emit-llvm -std=c++11 -o - | FileCheck %s --check-prefixes=CHECK,INEXACT
 // RUN: %clang_cc1 -I%S %s -triple x86_64-apple-darwin10 -O1 
-fno-assume-unique-vtables -emit-llvm -std=c++11 -o - | FileCheck %s 
--check-prefixes=CHECK,INEXACT
+// RUN: %clang_cc1 -I%S %s -triple arm64e-apple-darwin10 -O1 -fptrauth-calls 
-emit-llvm -std=c++11 -o - | FileCheck %s --check-prefixes=CHECK,INEXACT
 
 struct A { virtual ~A(); };
 struct B final : A { };

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/21.x: [clang][PAC] Fix PAC codegen for final class dynamic_cast optimization (#152227) (PR #152586)

2025-08-07 Thread via llvm-branch-commits

llvmbot wrote:

@asl What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/152586
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/21.x: [clang][PAC] Fix PAC codegen for final class dynamic_cast optimization (#152227) (PR #152587)

2025-08-07 Thread via llvm-branch-commits

llvmbot wrote:



@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: None (llvmbot)


Changes

Backport 726847829553079a13b1b7104f2c2db9dcda9c1d

Requested by: @ojhunt

---
Full diff: https://github.com/llvm/llvm-project/pull/152587.diff


2 Files Affected:

- (modified) clang/lib/CodeGen/CGExprCXX.cpp (+2-1) 
- (modified) clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp (+1) 


``diff
diff --git a/clang/lib/CodeGen/CGExprCXX.cpp b/clang/lib/CodeGen/CGExprCXX.cpp
index 359e30cb8f5cd..912b1d72c7e23 100644
--- a/clang/lib/CodeGen/CGExprCXX.cpp
+++ b/clang/lib/CodeGen/CGExprCXX.cpp
@@ -2313,7 +2313,8 @@ llvm::Value *CodeGenFunction::EmitDynamicCast(Address 
ThisAddr,
   bool IsExact = !IsDynamicCastToVoid &&
  CGM.getCodeGenOpts().OptimizationLevel > 0 &&
  DestRecordTy->getAsCXXRecordDecl()->isEffectivelyFinal() &&
- CGM.getCXXABI().shouldEmitExactDynamicCast(DestRecordTy);
+ CGM.getCXXABI().shouldEmitExactDynamicCast(DestRecordTy) &&
+ !getLangOpts().PointerAuthCalls;
 
   // C++ [expr.dynamic.cast]p4:
   //   If the value of v is a null pointer value in the pointer case, the 
result
diff --git a/clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp 
b/clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp
index 9a8ce1997a7f9..19c2a9bd0497e 100644
--- a/clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp
+++ b/clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp
@@ -3,6 +3,7 @@
 // RUN: %clang_cc1 -I%S %s -triple x86_64-apple-darwin10 -O1 
-fvisibility=hidden -emit-llvm -std=c++11 -o - | FileCheck %s 
--check-prefixes=CHECK,INEXACT
 // RUN: %clang_cc1 -I%S %s -triple x86_64-apple-darwin10 -O1 -fapple-kext 
-emit-llvm -std=c++11 -o - | FileCheck %s --check-prefixes=CHECK,INEXACT
 // RUN: %clang_cc1 -I%S %s -triple x86_64-apple-darwin10 -O1 
-fno-assume-unique-vtables -emit-llvm -std=c++11 -o - | FileCheck %s 
--check-prefixes=CHECK,INEXACT
+// RUN: %clang_cc1 -I%S %s -triple arm64e-apple-darwin10 -O1 -fptrauth-calls 
-emit-llvm -std=c++11 -o - | FileCheck %s --check-prefixes=CHECK,INEXACT
 
 struct A { virtual ~A(); };
 struct B final : A { };

``




https://github.com/llvm/llvm-project/pull/152587
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL] Add implicit binding attribute to resource arrays (PR #152452)

2025-08-07 Thread Ashley Coleman via llvm-branch-commits

https://github.com/V-FEXrt approved this pull request.


https://github.com/llvm/llvm-project/pull/152452
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Enable CodeGen for v_pk_fma_bf16 (PR #152578)

2025-08-07 Thread Stanislav Mekhanoshin via llvm-branch-commits

rampitec wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/152578?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#152578** https://app.graphite.dev/github/pr/llvm/llvm-project/152578?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/152578?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#152573** https://app.graphite.dev/github/pr/llvm/llvm-project/152573?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/152578
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Enable CodeGen for v_pk_fma_bf16 (PR #152578)

2025-08-07 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec created 
https://github.com/llvm/llvm-project/pull/152578

None

>From 6a9971d7cadb2dcc0169f02f92bd3f1eafb65635 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Thu, 7 Aug 2025 12:11:17 -0700
Subject: [PATCH] [AMDGPU] Enable CodeGen for v_pk_fma_bf16

---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |1 +
 llvm/test/CodeGen/AMDGPU/bf16-math.ll |   43 +-
 llvm/test/CodeGen/AMDGPU/bf16.ll  | 1139 +++--
 3 files changed, 392 insertions(+), 791 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 8f44c03d95b43..fd1be72ce6d82 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -6106,6 +6106,7 @@ bool SITargetLowering::isFMAFasterThanFMulAndFAdd(const 
MachineFunction &MF,
   case MVT::f64:
 return true;
   case MVT::f16:
+  case MVT::bf16:
 return Subtarget->has16BitInsts() && !denormalModeIsFlushAllF64F16(MF);
   default:
 break;
diff --git a/llvm/test/CodeGen/AMDGPU/bf16-math.ll 
b/llvm/test/CodeGen/AMDGPU/bf16-math.ll
index 682b3b4d57209..3a82f848f06a5 100644
--- a/llvm/test/CodeGen/AMDGPU/bf16-math.ll
+++ b/llvm/test/CodeGen/AMDGPU/bf16-math.ll
@@ -370,6 +370,9 @@ define amdgpu_ps bfloat @test_clamp_bf16_folding(bfloat 
%src) {
 ; GCN:   ; %bb.0:
 ; GCN-NEXT:v_exp_bf16_e64 v0, v0 clamp
 ; GCN-NEXT:; return to shader part epilog
+
+
+
   %exp = call bfloat @llvm.exp2.bf16(bfloat %src)
   %max = call bfloat @llvm.maxnum.bf16(bfloat %exp, bfloat 0.0)
   %clamp = call bfloat @llvm.minnum.bf16(bfloat %max, bfloat 1.0)
@@ -381,6 +384,9 @@ define amdgpu_ps float @test_clamp_v2bf16_folding(<2 x 
bfloat> %src0, <2 x bfloa
 ; GCN:   ; %bb.0:
 ; GCN-NEXT:v_pk_mul_bf16 v0, v0, v1 clamp
 ; GCN-NEXT:; return to shader part epilog
+
+
+
   %mul = fmul <2 x bfloat> %src0, %src1
   %max = call <2 x bfloat> @llvm.maxnum.v2bf16(<2 x bfloat> %mul, <2 x bfloat> 
)
   %clamp = call <2 x bfloat> @llvm.minnum.v2bf16(<2 x bfloat> %max, <2 x 
bfloat> )
@@ -391,11 +397,12 @@ define amdgpu_ps float @test_clamp_v2bf16_folding(<2 x 
bfloat> %src0, <2 x bfloa
 define amdgpu_ps void @v_test_mul_add_v2bf16_vvv(ptr addrspace(1) %out, <2 x 
bfloat> %a, <2 x bfloat> %b, <2 x bfloat> %c) {
 ; GCN-LABEL: v_test_mul_add_v2bf16_vvv:
 ; GCN:   ; %bb.0:
-; GCN-NEXT:v_pk_mul_bf16 v2, v2, v3
-; GCN-NEXT:s_delay_alu instid0(VALU_DEP_1)
-; GCN-NEXT:v_pk_add_bf16 v2, v2, v4
+; GCN-NEXT:v_pk_fma_bf16 v2, v2, v3, v4
 ; GCN-NEXT:global_store_b32 v[0:1], v2, off
 ; GCN-NEXT:s_endpgm
+
+
+
   %mul = fmul contract <2 x bfloat> %a, %b
   %add = fadd contract <2 x bfloat> %mul, %c
   store <2 x bfloat> %add, ptr addrspace(1) %out
@@ -405,11 +412,12 @@ define amdgpu_ps void @v_test_mul_add_v2bf16_vvv(ptr 
addrspace(1) %out, <2 x bfl
 define amdgpu_ps void @v_test_mul_add_v2bf16_vss(ptr addrspace(1) %out, <2 x 
bfloat> %a, <2 x bfloat> inreg %b, <2 x bfloat> inreg %c) {
 ; GCN-LABEL: v_test_mul_add_v2bf16_vss:
 ; GCN:   ; %bb.0:
-; GCN-NEXT:v_pk_mul_bf16 v2, v2, s0
-; GCN-NEXT:s_delay_alu instid0(VALU_DEP_1)
-; GCN-NEXT:v_pk_add_bf16 v2, v2, s1
+; GCN-NEXT:v_pk_fma_bf16 v2, v2, s0, s1
 ; GCN-NEXT:global_store_b32 v[0:1], v2, off
 ; GCN-NEXT:s_endpgm
+
+
+
   %mul = fmul contract <2 x bfloat> %a, %b
   %add = fadd contract <2 x bfloat> %mul, %c
   store <2 x bfloat> %add, ptr addrspace(1) %out
@@ -419,11 +427,14 @@ define amdgpu_ps void @v_test_mul_add_v2bf16_vss(ptr 
addrspace(1) %out, <2 x bfl
 define amdgpu_ps void @v_test_mul_add_v2bf16_sss(ptr addrspace(1) %out, <2 x 
bfloat> inreg %a, <2 x bfloat> inreg %b, <2 x bfloat> inreg %c) {
 ; GCN-LABEL: v_test_mul_add_v2bf16_sss:
 ; GCN:   ; %bb.0:
-; GCN-NEXT:v_pk_mul_bf16 v2, s0, s1
+; GCN-NEXT:v_mov_b32_e32 v2, s2
 ; GCN-NEXT:s_delay_alu instid0(VALU_DEP_1)
-; GCN-NEXT:v_pk_add_bf16 v2, v2, s2
+; GCN-NEXT:v_pk_fma_bf16 v2, s0, s1, v2
 ; GCN-NEXT:global_store_b32 v[0:1], v2, off
 ; GCN-NEXT:s_endpgm
+
+
+
   %mul = fmul contract <2 x bfloat> %a, %b
   %add = fadd contract <2 x bfloat> %mul, %c
   store <2 x bfloat> %add, ptr addrspace(1) %out
@@ -433,11 +444,12 @@ define amdgpu_ps void @v_test_mul_add_v2bf16_sss(ptr 
addrspace(1) %out, <2 x bfl
 define amdgpu_ps void @v_test_mul_add_v2bf16_vsc(ptr addrspace(1) %out, <2 x 
bfloat> %a, <2 x bfloat> inreg %b) {
 ; GCN-LABEL: v_test_mul_add_v2bf16_vsc:
 ; GCN:   ; %bb.0:
-; GCN-NEXT:v_pk_mul_bf16 v2, v2, s0
-; GCN-NEXT:s_delay_alu instid0(VALU_DEP_1)
-; GCN-NEXT:v_pk_add_bf16 v2, v2, 0.5 op_sel_hi:[1,0]
+; GCN-NEXT:v_pk_fma_bf16 v2, v2, s0, 0.5 op_sel_hi:[1,1,0]
 ; GCN-NEXT:global_store_b32 v[0:1], v2, off
 ; GCN-NEXT:s_endpgm
+
+
+
   %mul = fmul contract <2 x bfloat> %a, %b
   %add = fadd contract <2 x bfloat> %mul, 
   store <2 x bfloat> %add, ptr addrspace(1) %out
@@ -447,11 +459,14 @@ define amdgpu_ps void @v_test_mul_add_v2bf1

[llvm-branch-commits] [llvm] [AMDGPU] Enable CodeGen for v_pk_fma_bf16 (PR #152578)

2025-08-07 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec ready_for_review 
https://github.com/llvm/llvm-project/pull/152578
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [clang][PAC] Fix PAC codegen for final class dynamic_cast optimization (#152227) (PR #152271)

2025-08-07 Thread Oliver Hunt via llvm-branch-commits

https://github.com/ojhunt milestoned 
https://github.com/llvm/llvm-project/pull/152271
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [clang][PAC] Fix PAC codegen for final class dynamic_cast optimization (#152227) (PR #152271)

2025-08-07 Thread Oliver Hunt via llvm-branch-commits

ojhunt wrote:

/cherry-pick 726847829553079a13b1b7104f2c2db9dcda9c1d

https://github.com/llvm/llvm-project/pull/152271
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [clang][PAC] Fix PAC codegen for final class dynamic_cast optimization (#152227) (PR #152271)

2025-08-07 Thread Oliver Hunt via llvm-branch-commits

https://github.com/ojhunt closed 
https://github.com/llvm/llvm-project/pull/152271
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/21.x: [clang][PAC] Fix PAC codegen for final class dynamic_cast optimization (#152227) (PR #152586)

2025-08-07 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/152586
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [clang][PAC] Fix PAC codegen for final class dynamic_cast optimization (#152227) (PR #152271)

2025-08-07 Thread via llvm-branch-commits

llvmbot wrote:

/pull-request llvm/llvm-project#152586

https://github.com/llvm/llvm-project/pull/152271
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/21.x: [clang][PAC] Fix PAC codegen for final class dynamic_cast optimization (#152227) (PR #152586)

2025-08-07 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/152586

Backport 726847829553079a13b1b7104f2c2db9dcda9c1d

Requested by: @ojhunt

>From 789c9330fa0195dc5f9cdada51ae0f187197d562 Mon Sep 17 00:00:00 2001
From: Oliver Hunt 
Date: Tue, 5 Aug 2025 17:41:55 -0700
Subject: [PATCH] [clang][PAC] Fix PAC codegen for final class dynamic_cast
 optimization (#152227)

The codegen for the final class dynamic_cast optimization fails to
consider pointer authentication. This change resolves this be simply
disabling the optimization when pointer authentication enabled.

(cherry picked from commit 726847829553079a13b1b7104f2c2db9dcda9c1d)
---
 clang/lib/CodeGen/CGExprCXX.cpp   | 3 ++-
 clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/clang/lib/CodeGen/CGExprCXX.cpp b/clang/lib/CodeGen/CGExprCXX.cpp
index 359e30cb8f5cd..912b1d72c7e23 100644
--- a/clang/lib/CodeGen/CGExprCXX.cpp
+++ b/clang/lib/CodeGen/CGExprCXX.cpp
@@ -2313,7 +2313,8 @@ llvm::Value *CodeGenFunction::EmitDynamicCast(Address 
ThisAddr,
   bool IsExact = !IsDynamicCastToVoid &&
  CGM.getCodeGenOpts().OptimizationLevel > 0 &&
  DestRecordTy->getAsCXXRecordDecl()->isEffectivelyFinal() &&
- CGM.getCXXABI().shouldEmitExactDynamicCast(DestRecordTy);
+ CGM.getCXXABI().shouldEmitExactDynamicCast(DestRecordTy) &&
+ !getLangOpts().PointerAuthCalls;
 
   // C++ [expr.dynamic.cast]p4:
   //   If the value of v is a null pointer value in the pointer case, the 
result
diff --git a/clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp 
b/clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp
index 9a8ce1997a7f9..19c2a9bd0497e 100644
--- a/clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp
+++ b/clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp
@@ -3,6 +3,7 @@
 // RUN: %clang_cc1 -I%S %s -triple x86_64-apple-darwin10 -O1 
-fvisibility=hidden -emit-llvm -std=c++11 -o - | FileCheck %s 
--check-prefixes=CHECK,INEXACT
 // RUN: %clang_cc1 -I%S %s -triple x86_64-apple-darwin10 -O1 -fapple-kext 
-emit-llvm -std=c++11 -o - | FileCheck %s --check-prefixes=CHECK,INEXACT
 // RUN: %clang_cc1 -I%S %s -triple x86_64-apple-darwin10 -O1 
-fno-assume-unique-vtables -emit-llvm -std=c++11 -o - | FileCheck %s 
--check-prefixes=CHECK,INEXACT
+// RUN: %clang_cc1 -I%S %s -triple arm64e-apple-darwin10 -O1 -fptrauth-calls 
-emit-llvm -std=c++11 -o - | FileCheck %s --check-prefixes=CHECK,INEXACT
 
 struct A { virtual ~A(); };
 struct B final : A { };

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/21.x: [clang][PAC] Fix PAC codegen for final class dynamic_cast optimization (#152227) (PR #152586)

2025-08-07 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-clang-codegen

Author: None (llvmbot)


Changes

Backport 726847829553079a13b1b7104f2c2db9dcda9c1d

Requested by: @ojhunt

---
Full diff: https://github.com/llvm/llvm-project/pull/152586.diff


2 Files Affected:

- (modified) clang/lib/CodeGen/CGExprCXX.cpp (+2-1) 
- (modified) clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp (+1) 


``diff
diff --git a/clang/lib/CodeGen/CGExprCXX.cpp b/clang/lib/CodeGen/CGExprCXX.cpp
index 359e30cb8f5cd..912b1d72c7e23 100644
--- a/clang/lib/CodeGen/CGExprCXX.cpp
+++ b/clang/lib/CodeGen/CGExprCXX.cpp
@@ -2313,7 +2313,8 @@ llvm::Value *CodeGenFunction::EmitDynamicCast(Address 
ThisAddr,
   bool IsExact = !IsDynamicCastToVoid &&
  CGM.getCodeGenOpts().OptimizationLevel > 0 &&
  DestRecordTy->getAsCXXRecordDecl()->isEffectivelyFinal() &&
- CGM.getCXXABI().shouldEmitExactDynamicCast(DestRecordTy);
+ CGM.getCXXABI().shouldEmitExactDynamicCast(DestRecordTy) &&
+ !getLangOpts().PointerAuthCalls;
 
   // C++ [expr.dynamic.cast]p4:
   //   If the value of v is a null pointer value in the pointer case, the 
result
diff --git a/clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp 
b/clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp
index 9a8ce1997a7f9..19c2a9bd0497e 100644
--- a/clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp
+++ b/clang/test/CodeGenCXX/dynamic-cast-exact-disabled.cpp
@@ -3,6 +3,7 @@
 // RUN: %clang_cc1 -I%S %s -triple x86_64-apple-darwin10 -O1 
-fvisibility=hidden -emit-llvm -std=c++11 -o - | FileCheck %s 
--check-prefixes=CHECK,INEXACT
 // RUN: %clang_cc1 -I%S %s -triple x86_64-apple-darwin10 -O1 -fapple-kext 
-emit-llvm -std=c++11 -o - | FileCheck %s --check-prefixes=CHECK,INEXACT
 // RUN: %clang_cc1 -I%S %s -triple x86_64-apple-darwin10 -O1 
-fno-assume-unique-vtables -emit-llvm -std=c++11 -o - | FileCheck %s 
--check-prefixes=CHECK,INEXACT
+// RUN: %clang_cc1 -I%S %s -triple arm64e-apple-darwin10 -O1 -fptrauth-calls 
-emit-llvm -std=c++11 -o - | FileCheck %s --check-prefixes=CHECK,INEXACT
 
 struct A { virtual ~A(); };
 struct B final : A { };

``




https://github.com/llvm/llvm-project/pull/152586
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Adjust hard clause rules for gfx1250 (PR #152592)

2025-08-07 Thread Changpeng Fang via llvm-branch-commits

https://github.com/changpeng approved this pull request.


https://github.com/llvm/llvm-project/pull/152592
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL] Add implicit binding attribute to resource arrays (PR #152452)

2025-08-07 Thread Ashley Coleman via llvm-branch-commits


@@ -342,6 +346,17 @@ static bool isResourceRecordTypeOrArrayOf(VarDecl *VD) {
   return Ty->isHLSLResourceRecord() || Ty->isHLSLResourceRecordArray();
 }
 
+static const HLSLAttributedResourceType *
+getResourceArrayHandleType(VarDecl *VD) {
+  assert(VD->getType()->isHLSLResourceRecordArray() &&
+ "expected array of resource records");
+  const Type *Ty = VD->getType()->getUnqualifiedDesugaredType();
+  while (const ConstantArrayType *CAT = dyn_cast(Ty)) {
+Ty = CAT->getArrayElementTypeNoTypeQual()->getUnqualifiedDesugaredType();
+  }

V-FEXrt wrote:

Ahh, that makes sense!

https://github.com/llvm/llvm-project/pull/152452
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [IR] Introduce the `ptrtoaddr` instruction (PR #139357)

2025-08-07 Thread Alexander Richardson via llvm-branch-commits


@@ -3532,6 +3533,28 @@ void Verifier::visitFPToSIInst(FPToSIInst &I) {
   visitInstruction(I);
 }
 
+void Verifier::visitPtrToAddrInst(PtrToAddrInst &I) {
+  // Get the source and destination types
+  Type *SrcTy = I.getOperand(0)->getType();
+  Type *DestTy = I.getType();
+
+  Check(SrcTy->isPtrOrPtrVectorTy(), "PtrToAddr source must be pointer", &I);
+  Check(DestTy->isIntOrIntVectorTy(), "PtrToAddr result must be integral", &I);
+  Check(SrcTy->isVectorTy() == DestTy->isVectorTy(), "PtrToAddr type mismatch",
+&I);
+
+  if (SrcTy->isVectorTy()) {
+auto *VSrc = cast(SrcTy);
+auto *VDest = cast(DestTy);
+Check(VSrc->getElementCount() == VDest->getElementCount(),
+  "PtrToAddr vector width mismatch", &I);
+  }
+
+  Type *AddrTy = DL.getAddressType(SrcTy);
+  Check(AddrTy == DestTy, "PtrToAddr result must be address width", &I);
+  visitInstruction(I);
+}

arichardson wrote:

I added some basic checks, but noticed we don't check ConstantAggregate values, 
so I'll deal with that in a follow up.

https://github.com/llvm/llvm-project/pull/139357
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [IR] Introduce the `ptrtoaddr` instruction (PR #139357)

2025-08-07 Thread Alexander Richardson via llvm-branch-commits

https://github.com/arichardson updated 
https://github.com/llvm/llvm-project/pull/139357

>From 25dc175562349410f161ef0e80246301d9a7ba79 Mon Sep 17 00:00:00 2001
From: Alex Richardson 
Date: Fri, 9 May 2025 22:43:37 -0700
Subject: [PATCH] fix docs build

Created using spr 1.3.6-beta.1
---
 llvm/docs/LangRef.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 2d18d0d97aaee..38be6918ff73c 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -12435,7 +12435,7 @@ Example:
 .. _i_ptrtoaddr:
 
 '``ptrtoaddr .. to``' Instruction
-
+^
 
 Syntax:
 """

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [IR] Introduce the `ptrtoaddr` instruction (PR #139357)

2025-08-07 Thread Alexander Richardson via llvm-branch-commits

https://github.com/arichardson updated 
https://github.com/llvm/llvm-project/pull/139357

>From 25dc175562349410f161ef0e80246301d9a7ba79 Mon Sep 17 00:00:00 2001
From: Alex Richardson 
Date: Fri, 9 May 2025 22:43:37 -0700
Subject: [PATCH] fix docs build

Created using spr 1.3.6-beta.1
---
 llvm/docs/LangRef.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 2d18d0d97aaee..38be6918ff73c 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -12435,7 +12435,7 @@ Example:
 .. _i_ptrtoaddr:
 
 '``ptrtoaddr .. to``' Instruction
-
+^
 
 Syntax:
 """

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [IR] Introduce the `ptrtoaddr` instruction (PR #139357)

2025-08-07 Thread Alexander Richardson via llvm-branch-commits


@@ -3532,6 +3533,28 @@ void Verifier::visitFPToSIInst(FPToSIInst &I) {
   visitInstruction(I);
 }
 
+void Verifier::visitPtrToAddrInst(PtrToAddrInst &I) {
+  // Get the source and destination types
+  Type *SrcTy = I.getOperand(0)->getType();
+  Type *DestTy = I.getType();
+
+  Check(SrcTy->isPtrOrPtrVectorTy(), "PtrToAddr source must be pointer", &I);
+  Check(DestTy->isIntOrIntVectorTy(), "PtrToAddr result must be integral", &I);
+  Check(SrcTy->isVectorTy() == DestTy->isVectorTy(), "PtrToAddr type mismatch",
+&I);
+
+  if (SrcTy->isVectorTy()) {
+auto *VSrc = cast(SrcTy);
+auto *VDest = cast(DestTy);
+Check(VSrc->getElementCount() == VDest->getElementCount(),
+  "PtrToAddr vector width mismatch", &I);

arichardson wrote:

Fixed and also changed ptrtoint and inttoptr

https://github.com/llvm/llvm-project/pull/139357
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL] Add implicit binding attribute to resource arrays (PR #152452)

2025-08-07 Thread Joshua Batista via llvm-branch-commits


@@ -71,6 +71,10 @@ static RegisterType getRegisterType(ResourceClass RC) {
   llvm_unreachable("unexpected ResourceClass value");
 }
 
+static RegisterType getRegisterType(const HLSLAttributedResourceType *ResTy) {

bob80905 wrote:

You might consider renaming one of these functions, maybe add a 
"FromResourceType" to this new one. 

https://github.com/llvm/llvm-project/pull/152452
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL] Add implicit binding attribute to resource arrays (PR #152452)

2025-08-07 Thread Joshua Batista via llvm-branch-commits


@@ -34,6 +34,10 @@ RWBuffer UAV1 : register(u2), UAV2 : register(u4);
 // CHECK: HLSLResourceBindingAttr {{.*}} "" "space5"
 RWBuffer UAV3 : register(space5);
 
+// CHECK: VarDecl {{.*}} UAV_Array 'RWBuffer[10]'

bob80905 wrote:

Should we add a test case where HLSLVkBindingAttr already exists (I presume an 
explicit binding case), we check for HLSLVkBindingAttr, and check NOT that the 
new attr is added?

https://github.com/llvm/llvm-project/pull/152452
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL] Add implicit binding attribute to resource arrays (PR #152452)

2025-08-07 Thread Ashley Coleman via llvm-branch-commits


@@ -342,6 +346,17 @@ static bool isResourceRecordTypeOrArrayOf(VarDecl *VD) {
   return Ty->isHLSLResourceRecord() || Ty->isHLSLResourceRecordArray();
 }
 
+static const HLSLAttributedResourceType *
+getResourceArrayHandleType(VarDecl *VD) {
+  assert(VD->getType()->isHLSLResourceRecordArray() &&
+ "expected array of resource records");
+  const Type *Ty = VD->getType()->getUnqualifiedDesugaredType();
+  while (const ConstantArrayType *CAT = dyn_cast(Ty)) {
+Ty = CAT->getArrayElementTypeNoTypeQual()->getUnqualifiedDesugaredType();
+  }

V-FEXrt wrote:

This is grabbing the last value in the CAT?

also nit:
```suggestion
  while (const ConstantArrayType *CAT = dyn_cast(Ty))
Ty = CAT->getArrayElementTypeNoTypeQual()->getUnqualifiedDesugaredType();
```

https://github.com/llvm/llvm-project/pull/152452
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][ISel] Select constructive EXT_ZZZI pseudo instruction (PR #152554)

2025-08-07 Thread Gaëtan Bossu via llvm-branch-commits

https://github.com/gbossu edited 
https://github.com/llvm/llvm-project/pull/152554
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Adjust hard clause rules for gfx1250 (PR #152592)

2025-08-07 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec created 
https://github.com/llvm/llvm-project/pull/152592

Change from GFX12: Relax S_CLAUSE rules to all all non-flat memory types in
the same clause, and all Flat types in the same.

For VMEM/FLAT clause types now look like:

- Non-Flat (load, store, atomic): buffer, global, scratch, TDM, Async
- Flat: load, store, atomic

>From 7800b5d664f487df9fddbb085d9578812f598ec0 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Thu, 7 Aug 2025 13:34:44 -0700
Subject: [PATCH] [AMDGPU] Adjust hard clause rules for gfx1250

Change from GFX12: Relax S_CLAUSE rules to all all non-flat memory types in
the same clause, and all Flat types in the same.

For VMEM/FLAT clause types now look like:

- Non-Flat (load, store, atomic): buffer, global, scratch, TDM, Async
- Flat: load, store, atomic
---
 .../lib/Target/AMDGPU/SIInsertHardClauses.cpp |   6 +-
 .../test/CodeGen/AMDGPU/flat-saddr-atomics.ll |   4 +
 llvm/test/CodeGen/AMDGPU/global-load-xcnt.ll  |   5 +-
 .../CodeGen/AMDGPU/hard-clauses-gfx1250.mir   | 608 +-
 .../AMDGPU/llvm.amdgcn.struct.buffer.store.ll |   1 +
 5 files changed, 617 insertions(+), 7 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp
index d8fe8505bc722..0a68512668c7d 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp
@@ -51,7 +51,7 @@ static cl::opt
 namespace {
 
 enum HardClauseType {
-  // For GFX10:
+  // For GFX10 and GFX1250:
 
   // Texture, buffer, global or scratch memory instructions.
   HARDCLAUSE_VMEM,
@@ -102,7 +102,8 @@ class SIInsertHardClauses {
 
   HardClauseType getHardClauseType(const MachineInstr &MI) {
 if (MI.mayLoad() || (MI.mayStore() && ST->shouldClusterStores())) {
-  if (ST->getGeneration() == AMDGPUSubtarget::GFX10) {
+  if (ST->getGeneration() == AMDGPUSubtarget::GFX10 ||
+  ST->hasGFX1250Insts()) {
 if ((SIInstrInfo::isVMEM(MI) && !SIInstrInfo::isFLAT(MI)) ||
 SIInstrInfo::isSegmentSpecificFLAT(MI)) {
   if (ST->hasNSAClauseBug()) {
@@ -115,7 +116,6 @@ class SIInsertHardClauses {
 if (SIInstrInfo::isFLAT(MI))
   return HARDCLAUSE_FLAT;
   } else {
-assert(ST->getGeneration() >= AMDGPUSubtarget::GFX11);
 if (SIInstrInfo::isMIMG(MI)) {
   const AMDGPU::MIMGInfo *Info = AMDGPU::getMIMGInfo(MI.getOpcode());
   const AMDGPU::MIMGBaseOpcodeInfo *BaseInfo =
diff --git a/llvm/test/CodeGen/AMDGPU/flat-saddr-atomics.ll 
b/llvm/test/CodeGen/AMDGPU/flat-saddr-atomics.ll
index 7d36c9f07ea73..004d3c0c1cf53 100644
--- a/llvm/test/CodeGen/AMDGPU/flat-saddr-atomics.ll
+++ b/llvm/test/CodeGen/AMDGPU/flat-saddr-atomics.ll
@@ -284,6 +284,7 @@ define amdgpu_ps <2 x float> @flat_xchg_saddr_i64_rtn(ptr 
inreg %sbase, i32 %vof
 ; GFX1250-SDAG-NEXT:v_subrev_nc_u32_e32 v0, s1, v4
 ; GFX1250-SDAG-NEXT:s_delay_alu instid0(VALU_DEP_1)
 ; GFX1250-SDAG-NEXT:v_cndmask_b32_e32 v4, -1, v0, vcc_lo
+; GFX1250-SDAG-NEXT:s_clause 0x1
 ; GFX1250-SDAG-NEXT:scratch_load_b64 v[0:1], v4, off
 ; GFX1250-SDAG-NEXT:scratch_store_b64 v4, v[2:3], off scope:SCOPE_SE
 ; GFX1250-SDAG-NEXT:s_wait_xcnt 0x0
@@ -329,6 +330,7 @@ define amdgpu_ps <2 x float> @flat_xchg_saddr_i64_rtn(ptr 
inreg %sbase, i32 %vof
 ; GFX1250-GISEL-NEXT:v_subrev_nc_u32_e32 v0, s1, v6
 ; GFX1250-GISEL-NEXT:s_delay_alu instid0(VALU_DEP_1)
 ; GFX1250-GISEL-NEXT:v_cndmask_b32_e32 v2, -1, v0, vcc_lo
+; GFX1250-GISEL-NEXT:s_clause 0x1
 ; GFX1250-GISEL-NEXT:scratch_load_b64 v[0:1], v2, off
 ; GFX1250-GISEL-NEXT:scratch_store_b64 v2, v[4:5], off scope:SCOPE_SE
 ; GFX1250-GISEL-NEXT:s_wait_xcnt 0x0
@@ -382,6 +384,7 @@ define amdgpu_ps <2 x float> 
@flat_xchg_saddr_i64_rtn_neg128(ptr inreg %sbase, i
 ; GFX1250-SDAG-NEXT:v_subrev_nc_u32_e32 v0, s1, v4
 ; GFX1250-SDAG-NEXT:s_delay_alu instid0(VALU_DEP_1)
 ; GFX1250-SDAG-NEXT:v_cndmask_b32_e32 v4, -1, v0, vcc_lo
+; GFX1250-SDAG-NEXT:s_clause 0x1
 ; GFX1250-SDAG-NEXT:scratch_load_b64 v[0:1], v4, off
 ; GFX1250-SDAG-NEXT:scratch_store_b64 v4, v[2:3], off scope:SCOPE_SE
 ; GFX1250-SDAG-NEXT:s_wait_xcnt 0x0
@@ -430,6 +433,7 @@ define amdgpu_ps <2 x float> 
@flat_xchg_saddr_i64_rtn_neg128(ptr inreg %sbase, i
 ; GFX1250-GISEL-NEXT:v_subrev_nc_u32_e32 v0, s1, v6
 ; GFX1250-GISEL-NEXT:s_delay_alu instid0(VALU_DEP_1)
 ; GFX1250-GISEL-NEXT:v_cndmask_b32_e32 v2, -1, v0, vcc_lo
+; GFX1250-GISEL-NEXT:s_clause 0x1
 ; GFX1250-GISEL-NEXT:scratch_load_b64 v[0:1], v2, off
 ; GFX1250-GISEL-NEXT:scratch_store_b64 v2, v[4:5], off scope:SCOPE_SE
 ; GFX1250-GISEL-NEXT:s_wait_xcnt 0x0
diff --git a/llvm/test/CodeGen/AMDGPU/global-load-xcnt.ll 
b/llvm/test/CodeGen/AMDGPU/global-load-xcnt.ll
index 3a898a9214461..f0db321d3931a 100644
--- a/llvm/test/CodeGen/AMDGPU/global-load-xcnt.ll
+++ b/llvm/test/CodeGen/AMDGPU/global-load-xcnt.ll
@@ -244,

[llvm-branch-commits] [llvm] [AMDGPU] Adjust hard clause rules for gfx1250 (PR #152592)

2025-08-07 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Stanislav Mekhanoshin (rampitec)


Changes

Change from GFX12: Relax S_CLAUSE rules to all all non-flat memory types in
the same clause, and all Flat types in the same.

For VMEM/FLAT clause types now look like:

- Non-Flat (load, store, atomic): buffer, global, scratch, TDM, Async
- Flat: load, store, atomic

---

Patch is 61.28 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/152592.diff


5 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp (+3-3) 
- (modified) llvm/test/CodeGen/AMDGPU/flat-saddr-atomics.ll (+4) 
- (modified) llvm/test/CodeGen/AMDGPU/global-load-xcnt.ll (+3-2) 
- (modified) llvm/test/CodeGen/AMDGPU/hard-clauses-gfx1250.mir (+606-2) 
- (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.struct.buffer.store.ll (+1) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp
index d8fe8505bc722..0a68512668c7d 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp
@@ -51,7 +51,7 @@ static cl::opt
 namespace {
 
 enum HardClauseType {
-  // For GFX10:
+  // For GFX10 and GFX1250:
 
   // Texture, buffer, global or scratch memory instructions.
   HARDCLAUSE_VMEM,
@@ -102,7 +102,8 @@ class SIInsertHardClauses {
 
   HardClauseType getHardClauseType(const MachineInstr &MI) {
 if (MI.mayLoad() || (MI.mayStore() && ST->shouldClusterStores())) {
-  if (ST->getGeneration() == AMDGPUSubtarget::GFX10) {
+  if (ST->getGeneration() == AMDGPUSubtarget::GFX10 ||
+  ST->hasGFX1250Insts()) {
 if ((SIInstrInfo::isVMEM(MI) && !SIInstrInfo::isFLAT(MI)) ||
 SIInstrInfo::isSegmentSpecificFLAT(MI)) {
   if (ST->hasNSAClauseBug()) {
@@ -115,7 +116,6 @@ class SIInsertHardClauses {
 if (SIInstrInfo::isFLAT(MI))
   return HARDCLAUSE_FLAT;
   } else {
-assert(ST->getGeneration() >= AMDGPUSubtarget::GFX11);
 if (SIInstrInfo::isMIMG(MI)) {
   const AMDGPU::MIMGInfo *Info = AMDGPU::getMIMGInfo(MI.getOpcode());
   const AMDGPU::MIMGBaseOpcodeInfo *BaseInfo =
diff --git a/llvm/test/CodeGen/AMDGPU/flat-saddr-atomics.ll 
b/llvm/test/CodeGen/AMDGPU/flat-saddr-atomics.ll
index 7d36c9f07ea73..004d3c0c1cf53 100644
--- a/llvm/test/CodeGen/AMDGPU/flat-saddr-atomics.ll
+++ b/llvm/test/CodeGen/AMDGPU/flat-saddr-atomics.ll
@@ -284,6 +284,7 @@ define amdgpu_ps <2 x float> @flat_xchg_saddr_i64_rtn(ptr 
inreg %sbase, i32 %vof
 ; GFX1250-SDAG-NEXT:v_subrev_nc_u32_e32 v0, s1, v4
 ; GFX1250-SDAG-NEXT:s_delay_alu instid0(VALU_DEP_1)
 ; GFX1250-SDAG-NEXT:v_cndmask_b32_e32 v4, -1, v0, vcc_lo
+; GFX1250-SDAG-NEXT:s_clause 0x1
 ; GFX1250-SDAG-NEXT:scratch_load_b64 v[0:1], v4, off
 ; GFX1250-SDAG-NEXT:scratch_store_b64 v4, v[2:3], off scope:SCOPE_SE
 ; GFX1250-SDAG-NEXT:s_wait_xcnt 0x0
@@ -329,6 +330,7 @@ define amdgpu_ps <2 x float> @flat_xchg_saddr_i64_rtn(ptr 
inreg %sbase, i32 %vof
 ; GFX1250-GISEL-NEXT:v_subrev_nc_u32_e32 v0, s1, v6
 ; GFX1250-GISEL-NEXT:s_delay_alu instid0(VALU_DEP_1)
 ; GFX1250-GISEL-NEXT:v_cndmask_b32_e32 v2, -1, v0, vcc_lo
+; GFX1250-GISEL-NEXT:s_clause 0x1
 ; GFX1250-GISEL-NEXT:scratch_load_b64 v[0:1], v2, off
 ; GFX1250-GISEL-NEXT:scratch_store_b64 v2, v[4:5], off scope:SCOPE_SE
 ; GFX1250-GISEL-NEXT:s_wait_xcnt 0x0
@@ -382,6 +384,7 @@ define amdgpu_ps <2 x float> 
@flat_xchg_saddr_i64_rtn_neg128(ptr inreg %sbase, i
 ; GFX1250-SDAG-NEXT:v_subrev_nc_u32_e32 v0, s1, v4
 ; GFX1250-SDAG-NEXT:s_delay_alu instid0(VALU_DEP_1)
 ; GFX1250-SDAG-NEXT:v_cndmask_b32_e32 v4, -1, v0, vcc_lo
+; GFX1250-SDAG-NEXT:s_clause 0x1
 ; GFX1250-SDAG-NEXT:scratch_load_b64 v[0:1], v4, off
 ; GFX1250-SDAG-NEXT:scratch_store_b64 v4, v[2:3], off scope:SCOPE_SE
 ; GFX1250-SDAG-NEXT:s_wait_xcnt 0x0
@@ -430,6 +433,7 @@ define amdgpu_ps <2 x float> 
@flat_xchg_saddr_i64_rtn_neg128(ptr inreg %sbase, i
 ; GFX1250-GISEL-NEXT:v_subrev_nc_u32_e32 v0, s1, v6
 ; GFX1250-GISEL-NEXT:s_delay_alu instid0(VALU_DEP_1)
 ; GFX1250-GISEL-NEXT:v_cndmask_b32_e32 v2, -1, v0, vcc_lo
+; GFX1250-GISEL-NEXT:s_clause 0x1
 ; GFX1250-GISEL-NEXT:scratch_load_b64 v[0:1], v2, off
 ; GFX1250-GISEL-NEXT:scratch_store_b64 v2, v[4:5], off scope:SCOPE_SE
 ; GFX1250-GISEL-NEXT:s_wait_xcnt 0x0
diff --git a/llvm/test/CodeGen/AMDGPU/global-load-xcnt.ll 
b/llvm/test/CodeGen/AMDGPU/global-load-xcnt.ll
index 3a898a9214461..f0db321d3931a 100644
--- a/llvm/test/CodeGen/AMDGPU/global-load-xcnt.ll
+++ b/llvm/test/CodeGen/AMDGPU/global-load-xcnt.ll
@@ -244,8 +244,9 @@ define i32 @test_v64i32_load_store(ptr addrspace(1) %ptr, 
i32 %idx, ptr addrspac
 ; GCN-GISEL-NEXT:global_load_b128 v[60:63], v[0:1], off offset:16
 ; GCN-GISEL-NEXT:global_load_b128 v[0:3], v[0:1], off offset:240
 ; GCN-GISEL-NEXT:s_wait_loadcnt 0x0
-; GCN-GI

[llvm-branch-commits] [clang] release/21.x: [clang][PAC] Fix PAC codegen for final class dynamic_cast optimization (#152227) (PR #152587)

2025-08-07 Thread Anton Korobeynikov via llvm-branch-commits

https://github.com/asl approved this pull request.


https://github.com/llvm/llvm-project/pull/152587
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL] Global resource arrays element access (PR #152454)

2025-08-07 Thread Alex Sepkowski via llvm-branch-commits


@@ -84,6 +84,124 @@ void addRootSignature(llvm::dxbc::RootSignatureVersion 
RootSigVer,
   RootSignatureValMD->addOperand(MDVals);
 }
 
+// If the specified expr is a simple decay from an array to pointer,
+// return the array subexpression. Otherwise, return nullptr.
+static const Expr *getSubExprFromArrayDecayOperand(const Expr *E) {
+  const auto *CE = dyn_cast(E);
+  if (!CE || CE->getCastKind() != CK_ArrayToPointerDecay)
+return nullptr;
+  return CE->getSubExpr();
+}
+
+// Find array variable declaration from nested array subscript AST nodes
+static const ValueDecl *getArrayDecl(const ArraySubscriptExpr *ASE) {
+  const Expr *E = nullptr;
+  while (ASE != nullptr) {
+E = getSubExprFromArrayDecayOperand(ASE->getBase());
+if (!E)
+  return nullptr;
+ASE = dyn_cast(E);
+  }
+  if (const DeclRefExpr *DRE = dyn_cast_or_null(E))
+return DRE->getDecl();
+  return nullptr;
+}
+
+// Get the total size of the array, or -1 if the array is unbounded.
+static int getTotalArraySize(const clang::Type *Ty) {
+  assert(Ty->isArrayType() && "expected array type");
+  if (Ty->isIncompleteArrayType())
+return -1;
+  int Size = 1;
+  while (const auto *CAT = dyn_cast(Ty)) {
+Size *= CAT->getSExtSize();
+Ty = CAT->getArrayElementTypeNoTypeQual();
+  }
+  return Size;
+}
+
+// Find constructor decl for a specific resource record type and binding
+// (implicit vs. explicit). The constructor has 6 parameters.
+// For explicit binding the signature is:
+//   void(unsigned, unsigned, int, unsigned, const char *).
+// For implicit binding the signature is:
+//   void(unsigned, int, unsigned, unsigned, const char *).
+static CXXConstructorDecl *findResourceConstructorDecl(ASTContext &AST,
+   QualType ResTy,
+   bool ExplicitBinding) {
+  SmallVector ExpParmTypes = {
+  AST.UnsignedIntTy, AST.UnsignedIntTy, AST.UnsignedIntTy,
+  AST.UnsignedIntTy, AST.getPointerType(AST.CharTy.withConst())};
+  ExpParmTypes[ExplicitBinding ? 2 : 1] = AST.IntTy;
+
+  CXXRecordDecl *ResDecl = ResTy->getAsCXXRecordDecl();
+  for (auto *Ctor : ResDecl->ctors()) {
+if (Ctor->getNumParams() != ExpParmTypes.size())
+  continue;
+ParmVarDecl **ParmIt = Ctor->param_begin();
+QualType *ExpTyIt = ExpParmTypes.begin();
+for (; ParmIt != Ctor->param_end() && ExpTyIt != ExpParmTypes.end();
+ ++ParmIt, ++ExpTyIt) {
+  if ((*ParmIt)->getType() != *ExpTyIt)
+break;
+}
+if (ParmIt == Ctor->param_end())
+  return Ctor;
+  }
+  llvm_unreachable("did not find constructor for resource class");
+}
+
+static Value *buildNameForResource(llvm::StringRef BaseName,
+   CodeGenModule &CGM) {
+  std::string Str(BaseName);
+  std::string GlobalName(Str + ".str");
+  return CGM.GetAddrOfConstantCString(Str, GlobalName.c_str()).getPointer();
+}
+
+static void createResourceCtorArgs(CodeGenModule &CGM, CXXConstructorDecl *CD,
+   llvm::Value *ThisPtr, llvm::Value *Range,
+   llvm::Value *Index, StringRef Name,
+   HLSLResourceBindingAttr *RBA,
+   HLSLVkBindingAttr *VkBinding,
+   CallArgList &Args) {
+  assert((VkBinding || RBA) && "at least one a binding attribute expected");
+
+  std::optional RegisterSlot;
+  uint32_t SpaceNo = 0;
+  if (VkBinding) {
+RegisterSlot = VkBinding->getBinding();
+SpaceNo = VkBinding->getSet();
+  } else if (RBA) {
+if (RBA->hasRegisterSlot())
+  RegisterSlot = RBA->getSlotNumber();
+SpaceNo = RBA->getSpaceNumber();
+  }
+
+  ASTContext &AST = CD->getASTContext();
+  Value *NameStr = buildNameForResource(Name, CGM);
+  Value *Space = llvm::ConstantInt::get(CGM.IntTy, SpaceNo);
+
+  Args.add(RValue::get(ThisPtr), CD->getThisType());
+  if (RegisterSlot.has_value()) {

alsepkow wrote:

Ah, maybe this is where the order would matter? 
Do we want the argument ordering to match for both the explicit/implicit case? 
Right now they are different and that would explain the difference in ordering 
in the test cases that I commented on.

https://github.com/llvm/llvm-project/pull/152454
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL] Global resource arrays element access (PR #152454)

2025-08-07 Thread Alex Sepkowski via llvm-branch-commits

https://github.com/alsepkow edited 
https://github.com/llvm/llvm-project/pull/152454
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL] Global resource arrays element access (PR #152454)

2025-08-07 Thread Alex Sepkowski via llvm-branch-commits

https://github.com/alsepkow commented:

Submitting a couple comments.

https://github.com/llvm/llvm-project/pull/152454
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [CI] Setup generate_report to describe ninja failures (PR #152621)

2025-08-07 Thread Aiden Grossman via llvm-branch-commits

https://github.com/boomanaiden154 created 
https://github.com/llvm/llvm-project/pull/152621

This patch makes it so that generate_report will add information about
failed build actions to the summary report. This makes it significantly
easier to find compilation failures, especially given we run ninja with
-k 0.

This patch only does the integration into generate_report (along with
testing). Actual utilization in the script is split into a separate
patch to try and keep things clean.



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [lldb] [PATCH 7/7] [clang] improve NestedNameSpecifier: LLDB changes (PR #149949)

2025-08-07 Thread Matheus Izvekov via llvm-branch-commits

mizvekov wrote:

I managed to fix that, it was some problem with using `lld` instead of the 
macOS linker.

https://github.com/llvm/llvm-project/pull/149949
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL] Global resource arrays element access (PR #152454)

2025-08-07 Thread Alex Sepkowski via llvm-branch-commits


@@ -0,0 +1,59 @@
+// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.6-compute 
-finclude-default-header \
+// RUN:   -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s 
-check-prefixes=CHECK,DXIL
+// RUN: %clang_cc1 -finclude-default-header -triple 
spirv-unknown-vulkan-compute \
+// RUN:   -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s 
-check-prefixes=CHECK,SPV
+
+// CHECK: @[[BufA:.*]] = private unnamed_addr constant [2 x i8] c"A\00", align 
1
+// CHECK: @[[BufB:.*]] = private unnamed_addr constant [2 x i8] c"B\00", align 
1
+// CHECK: @[[BufC:.*]] = private unnamed_addr constant [2 x i8] c"C\00", align 
1
+// CHECK: @[[BufD:.*]] = private unnamed_addr constant [2 x i8] c"D\00", align 
1
+
+// different explicit binding for DXIL and SPIR-V
+[[vk::binding(12, 2)]]
+RWBuffer A[4] : register(u10, space1);
+
+[[vk::binding(13)]] // SPIR-V explicit binding 13, set 0
+RWBuffer B[5]; // DXIL implicit binding in space0
+
+// same explicit binding for both DXIL and SPIR-V
+// (SPIR-V takes the binding from register annotation if there is no 
vk::binding attribute))
+RWBuffer C[3] : register(u2);
+
+// implicit binding for both DXIL and SPIR-V in space/set 0 
+RWBuffer D[10];
+
+RWStructuredBuffer Out;
+
+[numthreads(4,1,1)]
+void main() {
+  // CHECK: define internal{{.*}} void @_Z4mainv()
+  // CHECK: %[[Tmp0:.*]] = alloca %"class.hlsl::RWBuffer
+  // CHECK: %[[Tmp1:.*]] = alloca %"class.hlsl::RWBuffer
+  // CHECK: %[[Tmp2:.*]] = alloca %"class.hlsl::RWBuffer
+  // CHECK: %[[Tmp3:.*]] = alloca %"class.hlsl::RWBuffer
+
+  // Make sure A[2] is translated to a RWBuffer constructor call with 
range 4 and index 2
+  // and DXIL explicit binding (u10, space1)
+  // and SPIR-V explicit binding (binding 12, set 2) 
+  // DXIL: call void @_ZN4hlsl8RWBufferIfEC1EjjijPKc(ptr {{.*}} %[[Tmp0]], i32 
noundef 10, i32 noundef 1, i32 noundef 4, i32 noundef 2, ptr noundef @[[BufA]])
+  // SPV: call void @_ZN4hlsl8RWBufferIfEC1EjjijPKc(ptr {{.*}} %[[Tmp0]], i32 
noundef 12, i32 noundef 2, i32 noundef 4, i32 noundef 2, ptr noundef @[[BufA]])
+
+  // Make sure B[3] is translated to a RWBuffer constructor call with 
range 5 and index 3
+  // and DXIL for implicit binding in space0, order id 0
+  // and SPIR-V explicit binding (binding 13, set 0)
+  // DXIL: call void @_ZN4hlsl8RWBufferIiEC1EjijjPKc(ptr {{.*}} %[[Tmp1]], i32 
noundef 0, i32 noundef 5, i32 noundef 3, i32 noundef 0, ptr noundef @[[BufB]])
+  // SPV: call void @_ZN4hlsl8RWBufferIiEC1EjjijPKc(ptr {{.*}} %[[Tmp1]], i32 
noundef 13, i32 noundef 0, i32 noundef 5, i32 noundef 3, ptr noundef @[[BufB]])
+
+  // Make sure C[1] is translated to a RWBuffer constructor call with 
range 3 and index 1
+  // and DXIL explicit binding (u2, space0) 
+  // and SPIR-V explicit binding (binding 2, set 0)
+  // DXIL: call void @_ZN4hlsl8RWBufferIiEC1EjjijPKc(ptr {{.*}} %[[Tmp2]], i32 
noundef 2, i32 noundef 0, i32 noundef 3, i32 noundef 1, ptr noundef @[[BufC]])
+  // SPV: call void @_ZN4hlsl8RWBufferIiEC1EjjijPKc(ptr {{.*}} %[[Tmp2]], i32 
noundef 2, i32 noundef 0, i32 noundef 3, i32 noundef 1, ptr noundef @[[BufC]])
+
+  // Make sure D[7] is translated to a RWBuffer constructor call with 
range 10 and index 7
+  // and DXIL for implicit binding in space0, order id 1
+  // and SPIR-V explicit binding (binding 13, set 0), order id 0

alsepkow wrote:

Wouldn't this be an implicit SPIR-V binding?

https://github.com/llvm/llvm-project/pull/152454
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL] Global resource arrays element access (PR #152454)

2025-08-07 Thread Alex Sepkowski via llvm-branch-commits


@@ -0,0 +1,59 @@
+// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.6-compute 
-finclude-default-header \
+// RUN:   -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s 
-check-prefixes=CHECK,DXIL
+// RUN: %clang_cc1 -finclude-default-header -triple 
spirv-unknown-vulkan-compute \
+// RUN:   -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s 
-check-prefixes=CHECK,SPV
+
+// CHECK: @[[BufA:.*]] = private unnamed_addr constant [2 x i8] c"A\00", align 
1
+// CHECK: @[[BufB:.*]] = private unnamed_addr constant [2 x i8] c"B\00", align 
1
+// CHECK: @[[BufC:.*]] = private unnamed_addr constant [2 x i8] c"C\00", align 
1
+// CHECK: @[[BufD:.*]] = private unnamed_addr constant [2 x i8] c"D\00", align 
1
+
+// different explicit binding for DXIL and SPIR-V
+[[vk::binding(12, 2)]]
+RWBuffer A[4] : register(u10, space1);
+
+[[vk::binding(13)]] // SPIR-V explicit binding 13, set 0
+RWBuffer B[5]; // DXIL implicit binding in space0
+
+// same explicit binding for both DXIL and SPIR-V
+// (SPIR-V takes the binding from register annotation if there is no 
vk::binding attribute))
+RWBuffer C[3] : register(u2);
+
+// implicit binding for both DXIL and SPIR-V in space/set 0 
+RWBuffer D[10];
+
+RWStructuredBuffer Out;
+
+[numthreads(4,1,1)]
+void main() {
+  // CHECK: define internal{{.*}} void @_Z4mainv()
+  // CHECK: %[[Tmp0:.*]] = alloca %"class.hlsl::RWBuffer
+  // CHECK: %[[Tmp1:.*]] = alloca %"class.hlsl::RWBuffer
+  // CHECK: %[[Tmp2:.*]] = alloca %"class.hlsl::RWBuffer
+  // CHECK: %[[Tmp3:.*]] = alloca %"class.hlsl::RWBuffer
+
+  // Make sure A[2] is translated to a RWBuffer constructor call with 
range 4 and index 2
+  // and DXIL explicit binding (u10, space1)
+  // and SPIR-V explicit binding (binding 12, set 2) 
+  // DXIL: call void @_ZN4hlsl8RWBufferIfEC1EjjijPKc(ptr {{.*}} %[[Tmp0]], i32 
noundef 10, i32 noundef 1, i32 noundef 4, i32 noundef 2, ptr noundef @[[BufA]])
+  // SPV: call void @_ZN4hlsl8RWBufferIfEC1EjjijPKc(ptr {{.*}} %[[Tmp0]], i32 
noundef 12, i32 noundef 2, i32 noundef 4, i32 noundef 2, ptr noundef @[[BufA]])
+
+  // Make sure B[3] is translated to a RWBuffer constructor call with 
range 5 and index 3
+  // and DXIL for implicit binding in space0, order id 0
+  // and SPIR-V explicit binding (binding 13, set 0)
+  // DXIL: call void @_ZN4hlsl8RWBufferIiEC1EjijjPKc(ptr {{.*}} %[[Tmp1]], i32 
noundef 0, i32 noundef 5, i32 noundef 3, i32 noundef 0, ptr noundef @[[BufB]])

alsepkow wrote:

The ordering of the operands seems inconsistent. When we're checking A on line 
38 it looks like the first operand is the register (10) and the second is the 
space (0).

But here it looks like we're being implicitly assigned register 0 and space 5. 
But the comment for B says implicit binding in space0.

https://github.com/llvm/llvm-project/pull/152454
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL] Global resource arrays element access (PR #152454)

2025-08-07 Thread Alex Sepkowski via llvm-branch-commits


@@ -0,0 +1,59 @@
+// RUN: %clang_cc1 -triple dxil-pc-shadermodel6.6-compute 
-finclude-default-header \
+// RUN:   -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s 
-check-prefixes=CHECK,DXIL
+// RUN: %clang_cc1 -finclude-default-header -triple 
spirv-unknown-vulkan-compute \
+// RUN:   -emit-llvm -disable-llvm-passes -o - %s | FileCheck %s 
-check-prefixes=CHECK,SPV
+
+// CHECK: @[[BufA:.*]] = private unnamed_addr constant [2 x i8] c"A\00", align 
1
+// CHECK: @[[BufB:.*]] = private unnamed_addr constant [2 x i8] c"B\00", align 
1
+// CHECK: @[[BufC:.*]] = private unnamed_addr constant [2 x i8] c"C\00", align 
1
+// CHECK: @[[BufD:.*]] = private unnamed_addr constant [2 x i8] c"D\00", align 
1
+
+// different explicit binding for DXIL and SPIR-V
+[[vk::binding(12, 2)]]
+RWBuffer A[4] : register(u10, space1);
+
+[[vk::binding(13)]] // SPIR-V explicit binding 13, set 0
+RWBuffer B[5]; // DXIL implicit binding in space0
+
+// same explicit binding for both DXIL and SPIR-V
+// (SPIR-V takes the binding from register annotation if there is no 
vk::binding attribute))
+RWBuffer C[3] : register(u2);
+
+// implicit binding for both DXIL and SPIR-V in space/set 0 
+RWBuffer D[10];
+
+RWStructuredBuffer Out;
+
+[numthreads(4,1,1)]
+void main() {
+  // CHECK: define internal{{.*}} void @_Z4mainv()
+  // CHECK: %[[Tmp0:.*]] = alloca %"class.hlsl::RWBuffer
+  // CHECK: %[[Tmp1:.*]] = alloca %"class.hlsl::RWBuffer
+  // CHECK: %[[Tmp2:.*]] = alloca %"class.hlsl::RWBuffer
+  // CHECK: %[[Tmp3:.*]] = alloca %"class.hlsl::RWBuffer
+
+  // Make sure A[2] is translated to a RWBuffer constructor call with 
range 4 and index 2
+  // and DXIL explicit binding (u10, space1)
+  // and SPIR-V explicit binding (binding 12, set 2) 
+  // DXIL: call void @_ZN4hlsl8RWBufferIfEC1EjjijPKc(ptr {{.*}} %[[Tmp0]], i32 
noundef 10, i32 noundef 1, i32 noundef 4, i32 noundef 2, ptr noundef @[[BufA]])

alsepkow wrote:

Curious about the mangled name in here '@_ZN4hlsl8RWBufferIfEC1EjjijPKc' Could 
that change? More so I'm wondering why we have that as part of the string we're 
matching here. To me it seems like that would be something we don't car as much 
about.

https://github.com/llvm/llvm-project/pull/152454
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [CI] Enable Build Failure Reporting (PR #152622)

2025-08-07 Thread Aiden Grossman via llvm-branch-commits

https://github.com/boomanaiden154 created 
https://github.com/llvm/llvm-project/pull/152622

This patch finishes up the plumbing so that generate_test_report will dump build
failures into the Github checks summary.



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


  1   2   >