[llvm-branch-commits] [clang] [Clang] Introduce -fsanitize=alloc-token (PR #156839)

2025-09-12 Thread Florian Mayer via llvm-branch-commits


@@ -73,8 +74,9 @@ class SanitizerArgs {
   bool HwasanUseAliases = false;
   llvm::AsanDetectStackUseAfterReturnMode AsanUseAfterReturn =
   llvm::AsanDetectStackUseAfterReturnMode::Invalid;
-

fmayer wrote:

stray change

https://github.com/llvm/llvm-project/pull/156839
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] Add pointer field protection feature. (PR #133538)

2025-09-12 Thread Peter Collingbourne via llvm-branch-commits


@@ -2201,6 +2215,22 @@ void CodeGenFunction::EmitCXXConstructorCall(
 EmitTypeCheck(CodeGenFunction::TCK_ConstructorCall, Loc, This,
   getContext().getRecordType(ClassDecl), CharUnits::Zero());
 
+  // When initializing an object that has pointer field protection and whose
+  // fields are not trivially relocatable we must initialize any pointer fields
+  // to a valid signed pointer (any pointer value will do, but we just use null
+  // pointers). This is because if the object is subsequently copied, its copy
+  // constructor will need to read and authenticate any pointer fields in order
+  // to copy the object to a new address, which will fail if the pointers are
+  // uninitialized.
+  if (!getContext().arePFPFieldsTriviallyRelocatable(D->getParent())) {

pcc wrote:

Looking more closely through the standard confirms that we don't need to do 
this initialization in the compiler. Because the uninitialized fields may be 
considered to be what the standard calls "invalid pointer values", the standard 
gives us a lot of leeway for implementation-defined behavior when reading them. 
The standard specifically calls out what we want to happen here:
> Some implementations might define that copying an invalid pointer value 
> causes a system-generated runtime fault.

In practice there seem to be only a few places that need to be fixed, so we can 
just fix them.

https://github.com/llvm/llvm-project/pull/133538
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopUnroll] Fix block frequencies when no runtime (PR #157754)

2025-09-12 Thread Joel E. Denny via llvm-branch-commits

https://github.com/jdenny-ornl created 
https://github.com/llvm/llvm-project/pull/157754

This patch implements the LoopUnroll changes discussed in [[RFC] Fix Loop 
Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785)
 and is thus another step in addressing issue #135812.

In summary, for the case of partial loop unrolling without a runtime, this 
patch changes LoopUnroll to:

- Maintain branch weights consistently with the original loop for the sake of 
preserving the total frequency of the original loop body.
- Store the new estimated trip count in the `llvm.loop.estimated_trip_count` 
metadata, introduced by PR #148758.
- Correct the new estimated trip count (e.g., 3 instead of 2) when the original 
estimated trip count (e.g., 10) divided by the unroll count (e.g., 4) leaves a 
remainder (e.g., 2).

There are loop unrolling cases this patch does not fully fix, such as partial 
unrolling with a runtime and complete unrolling, and there are two associated 
tests this patch marks as XFAIL.  They will be addressed in future patches that 
should land with this patch.

>From 75a8df62df2ef7e8c02d7a76120e57e2dd1a1539 Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" 
Date: Tue, 9 Sep 2025 17:33:38 -0400
Subject: [PATCH] [LoopUnroll] Fix block frequencies when no runtime

This patch implements the LoopUnroll changes discussed in [[RFC] Fix
Loop Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785)
and is thus another step in addressing issue #135812.

In summary, for the case of partial loop unrolling without a runtime,
this patch changes LoopUnroll to:

- Maintain branch weights consistently with the original loop for the
  sake of preserving the total frequency of the original loop body.
- Store the new estimated trip count in the
  `llvm.loop.estimated_trip_count` metadata, introduced by PR #148758.
- Correct the new estimated trip count (e.g., 3 instead of 2) when the
  original estimated trip count (e.g., 10) divided by the unroll count
  (e.g., 4) leaves a remainder (e.g., 2).

There are loop unrolling cases this patch does not fully fix, such as
partial unrolling with a runtime and complete unrolling, and there are
two associated tests this patch marks as XFAIL.  They will be
addressed in future patches that should land with this patch.
---
 llvm/lib/Transforms/Utils/LoopUnroll.cpp  | 36 --
 .../peel.ll}  |  0
 .../branch-weights-freq/unroll-partial.ll | 68 +++
 .../LoopUnroll/runtime-loop-branchweight.ll   |  1 +
 .../LoopUnroll/unroll-heuristics-pgo.ll   |  1 +
 5 files changed, 100 insertions(+), 6 deletions(-)
 rename llvm/test/Transforms/LoopUnroll/{peel-branch-weights-freq.ll => 
branch-weights-freq/peel.ll} (100%)
 create mode 100644 
llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial.ll

diff --git a/llvm/lib/Transforms/Utils/LoopUnroll.cpp 
b/llvm/lib/Transforms/Utils/LoopUnroll.cpp
index 8a6c7789d1372..93c43396c54b6 100644
--- a/llvm/lib/Transforms/Utils/LoopUnroll.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUnroll.cpp
@@ -499,9 +499,8 @@ llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, LoopInfo 
*LI,
 
   const unsigned MaxTripCount = SE->getSmallConstantMaxTripCount(L);
   const bool MaxOrZero = SE->isBackedgeTakenCountMaxOrZero(L);
-  unsigned EstimatedLoopInvocationWeight = 0;
   std::optional OriginalTripCount =
-  llvm::getLoopEstimatedTripCount(L, &EstimatedLoopInvocationWeight);
+  llvm::getLoopEstimatedTripCount(L);
 
   // Effectively "DCE" unrolled iterations that are beyond the max tripcount
   // and will never be executed.
@@ -1130,10 +1129,35 @@ llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, 
LoopInfo *LI,
 // We shouldn't try to use `L` anymore.
 L = nullptr;
   } else if (OriginalTripCount) {
-// Update the trip count. Note that the remainder has already logic
-// computing it in `UnrollRuntimeLoopRemainder`.
-setLoopEstimatedTripCount(L, *OriginalTripCount / ULO.Count,
-  EstimatedLoopInvocationWeight);
+// Update metadata for the estimated trip count.
+//
+// If ULO.Runtime, UnrollRuntimeLoopRemainder handles branch weights for 
the
+// remainder loop it creates, and the unrolled loop's branch weights are
+// adjusted below.  Otherwise, if unrolled loop iterations' latches become
+// unconditional, branch weights are adjusted above.  Otherwise, the
+// original loop's branch weights are correct for the unrolled loop, so do
+// not adjust them.
+// FIXME: Actually handle such unconditional latches and ULO.Runtime.
+//
+// For example, consider what happens if the unroll count is 4 for a loop
+// with an estimated trip count of 10 when we do not create a remainder 
loop
+// and all iterations' latches remain conditional.

[llvm-branch-commits] [AllocToken, Clang] Infer type hints from sizeof expressions and casts (PR #156841)

2025-09-12 Thread Florian Mayer via llvm-branch-commits


@@ -1349,6 +1350,98 @@ void CodeGenFunction::EmitAllocTokenHint(llvm::CallBase 
*CB,
   CB->setMetadata(llvm::LLVMContext::MD_alloc_token_hint, MDN);
 }
 
+/// Infer type from a simple sizeof expression.
+static QualType inferTypeFromSizeofExpr(const Expr *E) {
+  const Expr *Arg = E->IgnoreParenImpCasts();
+  if (const auto *UET = dyn_cast(Arg)) {
+if (UET->getKind() == UETT_SizeOf) {
+  if (UET->isArgumentType()) {
+return UET->getArgumentTypeInfo()->getType();
+  } else {
+return UET->getArgumentExpr()->getType();
+  }
+}
+  }
+  return QualType();
+}
+
+/// Infer type from an arithmetic expression involving a sizeof.
+static QualType inferTypeFromArithSizeofExpr(const Expr *E) {
+  const Expr *Arg = E->IgnoreParenImpCasts();
+  // The argument is a lone sizeof expression.
+  QualType QT = inferTypeFromSizeofExpr(Arg);

fmayer wrote:

```
if (QualType QT = inferTypeFromSizeofExpr(Arg); !QT.isNull())
  return QT;
```

and below

https://github.com/llvm/llvm-project/pull/156841
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lit] Support -c flag for diff (PR #157584)

2025-09-12 Thread Aiden Grossman via llvm-branch-commits

https://github.com/boomanaiden154 closed 
https://github.com/llvm/llvm-project/pull/157584
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Clang] Introduce -fsanitize=alloc-token (PR #156839)

2025-09-12 Thread Florian Mayer via llvm-branch-commits


@@ -2367,6 +2371,16 @@ bool CompilerInvocation::ParseCodeGenArgs(CodeGenOptions 
&Opts, ArgList &Args,
 }
   }
 
+  if (const auto *Arg = Args.getLastArg(options::OPT_falloc_token_max_EQ)) {
+StringRef S = Arg->getValue();
+uint64_t Value = 0;
+if (S.getAsInteger(0, Value)) {

fmayer wrote:

remove braces

https://github.com/llvm/llvm-project/pull/156839
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [MC] Rewrite stdin.s to use python (PR #157232)

2025-09-12 Thread Aiden Grossman via llvm-branch-commits

https://github.com/boomanaiden154 updated 
https://github.com/llvm/llvm-project/pull/157232

>From d749f30964e57caa797b3df87ae88ffc3d4a2f54 Mon Sep 17 00:00:00 2001
From: Aiden Grossman 
Date: Sun, 7 Sep 2025 17:39:19 +
Subject: [PATCH 1/3] feedback

Created using spr 1.3.6
---
 llvm/test/MC/COFF/stdin.py | 17 +
 llvm/test/MC/COFF/stdin.s  |  1 -
 2 files changed, 17 insertions(+), 1 deletion(-)
 create mode 100644 llvm/test/MC/COFF/stdin.py
 delete mode 100644 llvm/test/MC/COFF/stdin.s

diff --git a/llvm/test/MC/COFF/stdin.py b/llvm/test/MC/COFF/stdin.py
new file mode 100644
index 0..8b7b6ae1fba13
--- /dev/null
+++ b/llvm/test/MC/COFF/stdin.py
@@ -0,0 +1,17 @@
+# RUN: echo "// comment" > %t.input
+# RUN: which llvm-mc | %python %s %t
+
+import subprocess
+import sys
+
+llvm_mc_binary = sys.stdin.readlines()[0].strip()
+temp_file = sys.argv[1]
+input_file = temp_file + ".input"
+
+with open(temp_file, "w") as mc_stdout:
+mc_stdout.seek(4)
+subprocess.run(
+[llvm_mc_binary, "-filetype=obj", "-triple", "i686-pc-win32", 
input_file],
+stdout=mc_stdout,
+check=True,
+)
diff --git a/llvm/test/MC/COFF/stdin.s b/llvm/test/MC/COFF/stdin.s
deleted file mode 100644
index 8ceae7fdef501..0
--- a/llvm/test/MC/COFF/stdin.s
+++ /dev/null
@@ -1 +0,0 @@
-// RUN: bash -c '(echo "test"; llvm-mc -filetype=obj -triple i686-pc-win32 %s 
) > %t'

>From 0bfe954d4cd5edf4312e924c278c59e57644d5f1 Mon Sep 17 00:00:00 2001
From: Aiden Grossman 
Date: Mon, 8 Sep 2025 17:28:59 +
Subject: [PATCH 2/3] feedback

Created using spr 1.3.6
---
 llvm/test/MC/COFF/stdin.py | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/llvm/test/MC/COFF/stdin.py b/llvm/test/MC/COFF/stdin.py
index 8b7b6ae1fba13..1d9b50c022523 100644
--- a/llvm/test/MC/COFF/stdin.py
+++ b/llvm/test/MC/COFF/stdin.py
@@ -1,14 +1,22 @@
 # RUN: echo "// comment" > %t.input
 # RUN: which llvm-mc | %python %s %t
 
+import argparse
 import subprocess
 import sys
 
+parser = argparse.ArgumentParser()
+parser.add_argument("temp_file")
+arguments = parser.parse_args()
+
 llvm_mc_binary = sys.stdin.readlines()[0].strip()
-temp_file = sys.argv[1]
+temp_file = arguments.temp_file
 input_file = temp_file + ".input"
 
 with open(temp_file, "w") as mc_stdout:
+## We need to test that starting on an input stream with a non-zero offset
+## does not trigger an assertion in WinCOFFObjectWriter.cpp, so we seek
+## past zero for STDOUT.
 mc_stdout.seek(4)
 subprocess.run(
 [llvm_mc_binary, "-filetype=obj", "-triple", "i686-pc-win32", 
input_file],

>From 2ae17e4f18a95c52b53ad5ad45a19c4bf29e5025 Mon Sep 17 00:00:00 2001
From: Aiden Grossman 
Date: Mon, 8 Sep 2025 17:43:39 +
Subject: [PATCH 3/3] feedback

Created using spr 1.3.6
---
 llvm/test/MC/COFF/stdin.py | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/llvm/test/MC/COFF/stdin.py b/llvm/test/MC/COFF/stdin.py
index 1d9b50c022523..0da1b4895142b 100644
--- a/llvm/test/MC/COFF/stdin.py
+++ b/llvm/test/MC/COFF/stdin.py
@@ -1,25 +1,30 @@
 # RUN: echo "// comment" > %t.input
-# RUN: which llvm-mc | %python %s %t
+# RUN: which llvm-mc | %python %s %t.input %t
 
 import argparse
 import subprocess
 import sys
 
 parser = argparse.ArgumentParser()
+parser.add_argument("input_file")
 parser.add_argument("temp_file")
 arguments = parser.parse_args()
 
 llvm_mc_binary = sys.stdin.readlines()[0].strip()
-temp_file = arguments.temp_file
-input_file = temp_file + ".input"
 
-with open(temp_file, "w") as mc_stdout:
+with open(arguments.temp_file, "w") as mc_stdout:
 ## We need to test that starting on an input stream with a non-zero offset
 ## does not trigger an assertion in WinCOFFObjectWriter.cpp, so we seek
 ## past zero for STDOUT.
 mc_stdout.seek(4)
 subprocess.run(
-[llvm_mc_binary, "-filetype=obj", "-triple", "i686-pc-win32", 
input_file],
+[
+llvm_mc_binary,
+"-filetype=obj",
+"-triple",
+"i686-pc-win32",
+arguments.input_file,
+],
 stdout=mc_stdout,
 check=True,
 )

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [AllocToken, Clang] Infer type hints from sizeof expressions and casts (PR #156841)

2025-09-12 Thread Marco Elver via llvm-branch-commits

https://github.com/melver updated 
https://github.com/llvm/llvm-project/pull/156841


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Use lit internal shell by default (PR #157237)

2025-09-12 Thread Aiden Grossman via llvm-branch-commits

https://github.com/boomanaiden154 updated 
https://github.com/llvm/llvm-project/pull/157237


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] d47a574 - Revert "[HLSL] Rewrite semantics parsing (#152537)"

2025-09-12 Thread via llvm-branch-commits

Author: Nathan Gauër
Date: 2025-09-09T19:11:28+02:00
New Revision: d47a574d9ab76ae599a1d9dadbbaf9709ab35758

URL: 
https://github.com/llvm/llvm-project/commit/d47a574d9ab76ae599a1d9dadbbaf9709ab35758
DIFF: 
https://github.com/llvm/llvm-project/commit/d47a574d9ab76ae599a1d9dadbbaf9709ab35758.diff

LOG: Revert "[HLSL] Rewrite semantics parsing (#152537)"

This reverts commit 57e1846c96f0c858f687fe9c66f4e3793b52f497.

Added: 


Modified: 
clang/include/clang/AST/Attr.h
clang/include/clang/Basic/Attr.td
clang/include/clang/Basic/DiagnosticFrontendKinds.td
clang/include/clang/Basic/DiagnosticParseKinds.td
clang/include/clang/Basic/DiagnosticSemaKinds.td
clang/include/clang/Parse/Parser.h
clang/include/clang/Sema/SemaHLSL.h
clang/lib/Basic/Attributes.cpp
clang/lib/CodeGen/CGHLSLRuntime.cpp
clang/lib/CodeGen/CGHLSLRuntime.h
clang/lib/Parse/ParseHLSL.cpp
clang/lib/Sema/SemaDeclAttr.cpp
clang/lib/Sema/SemaHLSL.cpp
clang/test/CodeGenHLSL/semantics/SV_Position.ps.hlsl
clang/test/ParserHLSL/semantic_parsing.hlsl
clang/test/SemaHLSL/Semantics/invalid_entry_parameter.hlsl
clang/utils/TableGen/ClangAttrEmitter.cpp

Removed: 
clang/test/CodeGenHLSL/semantics/DispatchThreadID-noindex.hlsl
clang/test/CodeGenHLSL/semantics/SV_GroupID-noindex.hlsl
clang/test/CodeGenHLSL/semantics/SV_GroupThreadID-noindex.hlsl
clang/test/CodeGenHLSL/semantics/missing.hlsl
clang/test/ParserHLSL/semantic_parsing_define.hlsl



diff  --git a/clang/include/clang/AST/Attr.h b/clang/include/clang/AST/Attr.h
index fe388b9fa045e..994f236337b99 100644
--- a/clang/include/clang/AST/Attr.h
+++ b/clang/include/clang/AST/Attr.h
@@ -232,40 +232,6 @@ class HLSLAnnotationAttr : public InheritableAttr {
   }
 };
 
-class HLSLSemanticAttr : public HLSLAnnotationAttr {
-  unsigned SemanticIndex = 0;
-  LLVM_PREFERRED_TYPE(bool)
-  unsigned SemanticIndexable : 1;
-  LLVM_PREFERRED_TYPE(bool)
-  unsigned SemanticExplicitIndex : 1;
-
-protected:
-  HLSLSemanticAttr(ASTContext &Context, const AttributeCommonInfo &CommonInfo,
-   attr::Kind AK, bool IsLateParsed,
-   bool InheritEvenIfAlreadyPresent, bool SemanticIndexable)
-  : HLSLAnnotationAttr(Context, CommonInfo, AK, IsLateParsed,
-   InheritEvenIfAlreadyPresent) {
-this->SemanticIndexable = SemanticIndexable;
-this->SemanticExplicitIndex = false;
-  }
-
-public:
-  bool isSemanticIndexable() const { return SemanticIndexable; }
-
-  void setSemanticIndex(unsigned SemanticIndex) {
-this->SemanticIndex = SemanticIndex;
-this->SemanticExplicitIndex = true;
-  }
-
-  unsigned getSemanticIndex() const { return SemanticIndex; }
-
-  // Implement isa/cast/dyncast/etc.
-  static bool classof(const Attr *A) {
-return A->getKind() >= attr::FirstHLSLSemanticAttr &&
-   A->getKind() <= attr::LastHLSLSemanticAttr;
-  }
-};
-
 /// A parameter attribute which changes the argument-passing ABI rule
 /// for the parameter.
 class ParameterABIAttr : public InheritableParamAttr {

diff  --git a/clang/include/clang/Basic/Attr.td 
b/clang/include/clang/Basic/Attr.td
index b85abfcbecfcf..10bf96a50c982 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -779,16 +779,6 @@ class DeclOrStmtAttr : InheritableAttr;
 /// An attribute class for HLSL Annotations.
 class HLSLAnnotationAttr : InheritableAttr;
 
-class HLSLSemanticAttr : HLSLAnnotationAttr {
-  bit SemanticIndexable = Indexable;
-  int SemanticIndex = 0;
-  bit SemanticExplicitIndex = 0;
-
-  let Spellings = [];
-  let Subjects = SubjectList<[ParmVar, Field, Function]>;
-  let LangOpts = [HLSL];
-}
-
 /// A target-specific attribute.  This class is meant to be used as a mixin
 /// with InheritableAttr or Attr depending on the attribute's needs.
 class TargetSpecificAttr {
@@ -4900,6 +4890,27 @@ def HLSLNumThreads: InheritableAttr {
   let Documentation = [NumThreadsDocs];
 }
 
+def HLSLSV_GroupThreadID: HLSLAnnotationAttr {
+  let Spellings = [HLSLAnnotation<"sv_groupthreadid">];
+  let Subjects = SubjectList<[ParmVar, Field]>;
+  let LangOpts = [HLSL];
+  let Documentation = [HLSLSV_GroupThreadIDDocs];
+}
+
+def HLSLSV_GroupID: HLSLAnnotationAttr {
+  let Spellings = [HLSLAnnotation<"sv_groupid">];
+  let Subjects = SubjectList<[ParmVar, Field]>;
+  let LangOpts = [HLSL];
+  let Documentation = [HLSLSV_GroupIDDocs];
+}
+
+def HLSLSV_GroupIndex: HLSLAnnotationAttr {
+  let Spellings = [HLSLAnnotation<"sv_groupindex">];
+  let Subjects = SubjectList<[ParmVar, GlobalVar]>;
+  let LangOpts = [HLSL];
+  let Documentation = [HLSLSV_GroupIndexDocs];
+}
+
 def HLSLVkBinding : InheritableAttr {
   let Spellings = [CXX11<"vk", "binding">];
   let Subjects = SubjectList<[HLSLBufferObj, ExternalGlobalVar], ErrorDiag>;
@@ -4958,35 +4969,13 @@ def HLSLResourceBinding: InheritableAttr {
   }];
 }
 
-

[llvm-branch-commits] [llvm] [AMDGPU][gfx1250] Remove SCOPE_SE for scratch stores (PR #157640)

2025-09-12 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec approved this pull request.


https://github.com/llvm/llvm-project/pull/157640
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopPeel] Fix branch weights' effect on block frequencies (PR #128785)

2025-09-12 Thread Joel E. Denny via llvm-branch-commits

https://github.com/jdenny-ornl updated 
https://github.com/llvm/llvm-project/pull/128785

>From f4135207e955f6c2e358cad54a7ef6f2f18087f8 Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" 
Date: Wed, 19 Mar 2025 16:19:40 -0400
Subject: [PATCH 1/9] [LoopPeel] Fix branch weights' effect on block
 frequencies

For example:

```
declare void @f(i32)

define void @test(i32 %n) {
entry:
  br label %do.body

do.body:
  %i = phi i32 [ 0, %entry ], [ %inc, %do.body ]
  %inc = add i32 %i, 1
  call void @f(i32 %i)
  %c = icmp sge i32 %inc, %n
  br i1 %c, label %do.end, label %do.body, !prof !0

do.end:
  ret void
}

!0 = !{!"branch_weights", i32 1, i32 9}
```

Given those branch weights, once any loop iteration is actually
reached, the probability of the loop exiting at the iteration's end is
1/(1+9).  That is, the loop is likely to exit every 10 iterations and
thus has an estimated trip count of 10.  `opt
-passes='print'` shows that 10 is indeed the frequency of
the loop body:

```
Printing analysis results of BFI for function 'test':
block-frequency-info: test
 - entry: float = 1.0, int = 1801439852625920
 - do.body: float = 10.0, int = 18014398509481984
 - do.end: float = 1.0, int = 1801439852625920
```

Key Observation: The frequency of reaching any particular iteration is
less than for the previous iteration because the previous iteration
has a non-zero probability of exiting the loop.  This observation
holds even though every loop iteration, once actually reached, has
exactly the same probability of exiting and thus exactly the same
branch weights.

Now we use `opt -unroll-force-peel-count=2 -passes=loop-unroll` to
peel 2 iterations and insert them before the remaining loop.  We
expect the key observation above not to change, but it does under the
implementation without this patch.  The block frequency becomes 1.0
for the first iteration, 0.9 for the second, and 6.4 for the main loop
body.  Again, a decreasing frequency is expected, but it decreases too
much: the total frequency of the original loop body becomes 8.3.  The
new branch weights reveal the problem:

```
!0 = !{!"branch_weights", i32 1, i32 9}
!1 = !{!"branch_weights", i32 1, i32 8}
!2 = !{!"branch_weights", i32 1, i32 7}
```

The exit probability is now 1/10 for the first peeled iteration, 1/9
for the second, and 1/8 for the remaining loop iterations.  It seems
this behavior is trying to ensure a decreasing block frequency.
However, as in the key observation above for the original loop, that
happens correctly without decreasing the branch weights across
iterations.

This patch changes the peeling implementation not to decrease the
branch weights across loop iterations so that the frequency for every
iteration is the same as it was in the original loop.  The total
frequency of the loop body, summed across all its occurrences, thus
remains 10 after peeling.

Unfortunately, that change means a later analysis cannot accurately
estimate the trip count of the remaining loop while examining the
remaining loop in isolation without considering the probability of
actually reaching it.  For that purpose, this patch stores the new
trip count as separate metadata named `llvm.loop.estimated_trip_count`
and extends `llvm::getLoopEstimatedTripCount` to prefer it, if
present, over branch weights.

An alternative fix is for `llvm::getLoopEstimatedTripCount` to
subtract the `llvm.loop.peeled.count` metadata from the trip count
estimated by a loop's branch weights.  However, there might be other
loop transformations that still corrupt block frequencies in a similar
manner and require a similar fix.  `llvm.loop.estimated_trip_count` is
intended to provide a general way to store estimated trip counts when
branch weights cannot directly store them.

This patch introduces several fixme comments that need to be addressed
before it can land.
---
 .../include/llvm/Transforms/Utils/LoopUtils.h |  25 ++-
 llvm/lib/Transforms/Utils/LoopPeel.cpp| 145 +++---
 llvm/lib/Transforms/Utils/LoopUtils.cpp   |  20 ++-
 .../LoopUnroll/peel-branch-weights-freq.ll|  75 +
 .../LoopUnroll/peel-branch-weights.ll |  64 
 .../LoopUnroll/peel-loop-pgo-deopt.ll |  11 +-
 .../Transforms/LoopUnroll/peel-loop-pgo.ll|  13 +-
 .../Transforms/LoopVectorize/X86/pr81872.ll   |  18 ++-
 8 files changed, 217 insertions(+), 154 deletions(-)
 create mode 100644 llvm/test/Transforms/LoopUnroll/peel-branch-weights-freq.ll

diff --git a/llvm/include/llvm/Transforms/Utils/LoopUtils.h 
b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
index 8f4c0c88336ac..82d23a4b68ea1 100644
--- a/llvm/include/llvm/Transforms/Utils/LoopUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
@@ -315,7 +315,8 @@ TransformationMode hasLICMVersioningTransformation(const 
Loop *L);
 void addStringMetadataToLoop(Loop *TheLoop, const char *MDString,
  unsigned V = 0);
 
-/// Returns a loop's estimated trip count based on branch weight metadata.
+

[llvm-branch-commits] [llvm] [AllocToken, Clang] Implement __builtin_alloc_token_infer() and llvm.alloc.token.id (PR #156842)

2025-09-12 Thread Oliver Hunt via llvm-branch-commits

https://github.com/ojhunt edited 
https://github.com/llvm/llvm-project/pull/156842
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AllocToken, Clang] Implement __builtin_alloc_token_infer() and llvm.alloc.token.id (PR #156842)

2025-09-12 Thread Oliver Hunt via llvm-branch-commits


@@ -846,6 +836,22 @@ static void addSanitizers(const Triple &TargetTriple,
   }
 }
 
+static void addAllocTokenPass(const Triple &TargetTriple,

ojhunt wrote:

I'd rather separate sema changes from codegen

https://github.com/llvm/llvm-project/pull/156842
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AllocToken, Clang] Implement __builtin_alloc_token_infer() and llvm.alloc.token.id (PR #156842)

2025-09-12 Thread Oliver Hunt via llvm-branch-commits


@@ -5760,6 +5764,24 @@ bool Sema::BuiltinAllocaWithAlign(CallExpr *TheCall) {
   return false;
 }
 
+bool Sema::BuiltinAllocTokenInfer(CallExpr *TheCall) {

ojhunt wrote:

I would prefer this not be a Sema member, and would prefer the static function 
with a `Sema&` parameter model instead? I find the `Sema::Builtin*(..)` naming 
model to be unnecessarily noisy and confusing, and as it looks like there's a 
mix of `Sema::` methods  and `static` functions, I wonder if there's a strong 
preference among others?

cc @Endilll, and @AaronBallman (who I believe is on vacation or similar so I 
would expect/hope not to get an immediate reply) 



https://github.com/llvm/llvm-project/pull/156842
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AllocToken, Clang] Implement __builtin_alloc_token_infer() and llvm.alloc.token.id (PR #156842)

2025-09-12 Thread Oliver Hunt via llvm-branch-commits


@@ -3352,10 +3352,15 @@ class CodeGenFunction : public CodeGenTypeCache {
   SanitizerAnnotateDebugInfo(ArrayRef 
Ordinals,
  SanitizerHandler Handler);
 
-  /// Emit additional metadata used by the AllocToken instrumentation.
+  /// Emit metadata used by the AllocToken instrumentation.
+  llvm::MDNode *EmitAllocTokenHint(QualType AllocType);

ojhunt wrote:

Or Compute? or something other than Emit. You're not emitting the hint, you're 
simply constructing it to permit it to be usable in multiple places :D

https://github.com/llvm/llvm-project/pull/156842
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [mlir] [flang][OpenMP] Support multi-block reduction combiner regions on the GPU (PR #156837)

2025-09-12 Thread Kareem Ergawy via llvm-branch-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/156837

>From adf9d42e554437a8e816e190a8ad64ae4770404c Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Thu, 4 Sep 2025 01:06:21 -0500
Subject: [PATCH] [flang][OpenMP] Support multi-block reduction combiner 
 regions on the GPU

Fixes a bug related to insertion points when inlining multi-block
combiner reduction regions. The IP at the end of the inlined region was
not used resulting in emitting BBs with multiple terminators.
---
 llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp |  3 +
 .../omptarget-multi-block-reduction.mlir  | 85 +++
 2 files changed, 88 insertions(+)
 create mode 100644 mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir

diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp 
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index 3d5e487c8990f..fe00a2a5696dc 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -3506,6 +3506,8 @@ Expected 
OpenMPIRBuilder::createReductionFunction(
 return AfterIP.takeError();
   if (!Builder.GetInsertBlock())
 return ReductionFunc;
+
+  Builder.SetInsertPoint(AfterIP->getBlock(), AfterIP->getPoint());
   Builder.CreateStore(Reduced, LHSPtr);
 }
   }
@@ -3750,6 +3752,7 @@ OpenMPIRBuilder::InsertPointOrErrorTy 
OpenMPIRBuilder::createReductionsGPU(
   RI.ReductionGen(Builder.saveIP(), RHSValue, LHSValue, Reduced);
   if (!AfterIP)
 return AfterIP.takeError();
+  Builder.SetInsertPoint(AfterIP->getBlock(), AfterIP->getPoint());
   Builder.CreateStore(Reduced, LHS, false);
 }
   }
diff --git a/mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir 
b/mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir
new file mode 100644
index 0..aaf06d2d0e0c2
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir
@@ -0,0 +1,85 @@
+// RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s
+
+// Verifies that the IR builder can handle reductions with multi-block combiner
+// regions on the GPU.
+
+module attributes {dlti.dl_spec = #dlti.dl_spec<"dlti.alloca_memory_space" = 5 
: ui64, "dlti.global_memory_space" = 1 : ui64>, llvm.target_triple = 
"amdgcn-amd-amdhsa", omp.is_gpu = true, omp.is_target_device = true} {
+  llvm.func @bar() {}
+  llvm.func @baz() {}
+
+  omp.declare_reduction @add_reduction_byref_box_5xf32 : !llvm.ptr alloc {
+%0 = llvm.mlir.constant(1 : i64) : i64
+%1 = llvm.alloca %0 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8, array<1 
x array<3 x i64>>)> : (i64) -> !llvm.ptr<5>
+%2 = llvm.addrspacecast %1 : !llvm.ptr<5> to !llvm.ptr
+omp.yield(%2 : !llvm.ptr)
+  } init {
+  ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
+omp.yield(%arg1 : !llvm.ptr)
+  } combiner {
+  ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
+llvm.call @bar() : () -> ()
+llvm.br ^bb3
+
+  ^bb3:  // pred: ^bb1
+llvm.call @baz() : () -> ()
+omp.yield(%arg0 : !llvm.ptr)
+  }
+  llvm.func @foo_() {
+%c1 = llvm.mlir.constant(1 : i64) : i64
+%10 = llvm.alloca %c1 x !llvm.array<5 x f32> {bindc_name = "x"} : (i64) -> 
!llvm.ptr<5>
+%11 = llvm.addrspacecast %10 : !llvm.ptr<5> to !llvm.ptr
+%74 = omp.map.info var_ptr(%11 : !llvm.ptr, !llvm.array<5 x f32>) 
map_clauses(tofrom) capture(ByRef) -> !llvm.ptr {name = "x"}
+omp.target map_entries(%74 -> %arg0 : !llvm.ptr) {
+  %c1_2 = llvm.mlir.constant(1 : i32) : i32
+  %c10 = llvm.mlir.constant(10 : i32) : i32
+  omp.teams reduction(byref @add_reduction_byref_box_5xf32 %arg0 -> %arg2 
: !llvm.ptr) {
+omp.parallel {
+  omp.distribute {
+omp.wsloop {
+  omp.loop_nest (%arg5) : i32 = (%c1_2) to (%c10) inclusive step 
(%c1_2) {
+omp.yield
+  }
+} {omp.composite}
+  } {omp.composite}
+  omp.terminator
+} {omp.composite}
+omp.terminator
+  }
+  omp.terminator
+}
+llvm.return
+  }
+}
+
+// CHECK:  call void @__kmpc_parallel_51({{.*}}, i32 1, i32 -1, i32 -1,
+// CHECK-SAME:   ptr @[[PAR_OUTLINED:.*]], ptr null, ptr %2, i64 1)
+
+// CHECK: define internal void @[[PAR_OUTLINED]]{{.*}} {
+// CHECK:   .omp.reduction.then:
+// CHECK: br label %omp.reduction.nonatomic.body
+
+// CHECK:   omp.reduction.nonatomic.body:
+// CHECK: call void @bar()
+// CHECK: br label %[[BODY_2ND_BB:.*]]
+
+// CHECK:   [[BODY_2ND_BB]]:
+// CHECK: call void @baz()
+// CHECK: br label %[[CONT_BB:.*]]
+
+// CHECK:   [[CONT_BB]]:
+// CHECK: br label %.omp.reduction.done
+// CHECK: }
+
+// CHECK: define internal void @"{{.*}}$reduction$reduction_func"(ptr noundef 
%0, ptr noundef %1) #0 {
+// CHECK: br label %omp.reduction.nonatomic.body
+
+// CHECK:   [[BODY_2ND_BB:.*]]:
+// CHECK: call void @baz()
+// CHECK: br label %omp.region.cont
+
+
+// CHECK: omp.reduction.nonatomic.body:
+// CHECK:   call void @bar()

[llvm-branch-commits] [flang] [flang][OpenMP] `do concurrent`: support `local` on device (PR #157638)

2025-09-12 Thread Kareem Ergawy via llvm-branch-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/157638

>From cbb2c67df6d5a234dc66ae012f88c1ff36f1ac47 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Tue, 2 Sep 2025 05:54:00 -0500
Subject: [PATCH] [flang][OpenMP] `do concurrent`: support `local` on device

Extends support for mapping `do concurrent` on the device by adding
support for `local` specifiers. The changes in this PR map the local
variable to the `omp.target` op and uses the mapped value as the
`private` clause operand in the nested `omp.parallel` op.
---
 .../include/flang/Optimizer/Dialect/FIROps.td |  12 ++
 .../OpenMP/DoConcurrentConversion.cpp | 192 +++---
 .../Transforms/DoConcurrent/local_device.mlir |  49 +
 3 files changed, 175 insertions(+), 78 deletions(-)
 create mode 100644 flang/test/Transforms/DoConcurrent/local_device.mlir

diff --git a/flang/include/flang/Optimizer/Dialect/FIROps.td 
b/flang/include/flang/Optimizer/Dialect/FIROps.td
index bc971e8fd6600..fc6eedc6ed4c6 100644
--- a/flang/include/flang/Optimizer/Dialect/FIROps.td
+++ b/flang/include/flang/Optimizer/Dialect/FIROps.td
@@ -3894,6 +3894,18 @@ def fir_DoConcurrentLoopOp : fir_Op<"do_concurrent.loop",
   return getReduceVars().size();
 }
 
+unsigned getInductionVarsStart() {
+  return 0;
+}
+
+unsigned getLocalOperandsStart() {
+  return getNumInductionVars();
+}
+
+unsigned getReduceOperandsStart() {
+  return getLocalOperandsStart() + getNumLocalOperands();
+}
+
 mlir::Block::BlockArgListType getInductionVars() {
   return getBody()->getArguments().slice(0, getNumInductionVars());
 }
diff --git a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp 
b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
index 6c71924000842..d00a4fdd2cf2e 100644
--- a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
+++ b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
@@ -138,6 +138,9 @@ void collectLoopLiveIns(fir::DoConcurrentLoopOp loop,
 
 liveIns.push_back(operand->get());
   });
+
+  for (mlir::Value local : loop.getLocalVars())
+liveIns.push_back(local);
 }
 
 /// Collects values that are local to a loop: "loop-local values". A loop-local
@@ -298,8 +301,7 @@ class DoConcurrentConversion
   .getIsTargetDevice();
 
   mlir::omp::TargetOperands targetClauseOps;
-  genLoopNestClauseOps(doLoop.getLoc(), rewriter, loop, mapper,
-   loopNestClauseOps,
+  genLoopNestClauseOps(doLoop.getLoc(), rewriter, loop, loopNestClauseOps,
isTargetDevice ? nullptr : &targetClauseOps);
 
   LiveInShapeInfoMap liveInShapeInfoMap;
@@ -321,14 +323,13 @@ class DoConcurrentConversion
 }
 
 mlir::omp::ParallelOp parallelOp =
-genParallelOp(doLoop.getLoc(), rewriter, ivInfos, mapper);
+genParallelOp(rewriter, loop, ivInfos, mapper);
 
 // Only set as composite when part of `distribute parallel do`.
 parallelOp.setComposite(mapToDevice);
 
 if (!mapToDevice)
-  genLoopNestClauseOps(doLoop.getLoc(), rewriter, loop, mapper,
-   loopNestClauseOps);
+  genLoopNestClauseOps(doLoop.getLoc(), rewriter, loop, loopNestClauseOps);
 
 for (mlir::Value local : locals)
   looputils::localizeLoopLocalValue(local, parallelOp.getRegion(),
@@ -337,10 +338,38 @@ class DoConcurrentConversion
 if (mapToDevice)
   genDistributeOp(doLoop.getLoc(), rewriter).setComposite(/*val=*/true);
 
-mlir::omp::LoopNestOp ompLoopNest =
+auto [loopNestOp, wsLoopOp] =
 genWsLoopOp(rewriter, loop, mapper, loopNestClauseOps,
 /*isComposite=*/mapToDevice);
 
+// `local` region arguments are transferred/cloned from the `do concurrent`
+// loop to the loopnest op when the region is cloned above. Instead, these
+// region arguments should be on the workshare loop's region.
+if (mapToDevice) {
+  for (auto [parallelArg, loopNestArg] : llvm::zip_equal(
+   parallelOp.getRegion().getArguments(),
+   loopNestOp.getRegion().getArguments().slice(
+   loop.getLocalOperandsStart(), loop.getNumLocalOperands(
+rewriter.replaceAllUsesWith(loopNestArg, parallelArg);
+
+  for (auto [wsloopArg, loopNestArg] : llvm::zip_equal(
+   wsLoopOp.getRegion().getArguments(),
+   loopNestOp.getRegion().getArguments().slice(
+   loop.getReduceOperandsStart(), 
loop.getNumReduceOperands(
+rewriter.replaceAllUsesWith(loopNestArg, wsloopArg);
+} else {
+  for (auto [wsloopArg, loopNestArg] :
+   llvm::zip_equal(wsLoopOp.getRegion().getArguments(),
+   loopNestOp.getRegion().getArguments().drop_front(
+   loopNestClauseOps.loopLowerBounds.size(
+rewriter.replaceAllUsesWith(loopNestArg, wsloopArg);
+}
+
+for (unsigned i = 0;
+ i 

[llvm-branch-commits] [llvm] Revert "[AMDGPU][gfx1250] Add `cu-store` subtarget feature (#150588)" (PR #157639)

2025-09-12 Thread Pierre van Houtryve via llvm-branch-commits

Pierre-vh wrote:

### Merge activity

* **Sep 10, 8:16 AM UTC**: A user started a stack merge that includes this pull 
request via 
[Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/157639).


https://github.com/llvm/llvm-project/pull/157639
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][gfx1250] Support "cluster" syncscope (PR #157641)

2025-09-12 Thread Pierre van Houtryve via llvm-branch-commits

Pierre-vh wrote:

### Merge activity

* **Sep 10, 8:16 AM UTC**: A user started a stack merge that includes this pull 
request via 
[Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/157641).


https://github.com/llvm/llvm-project/pull/157641
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Revert "[AMDGPU][gfx1250] Add `cu-store` subtarget feature (#150588)" (PR #157639)

2025-09-12 Thread Pierre van Houtryve via llvm-branch-commits

Pierre-vh wrote:

> Why do we want to revert it? Can you put it into the description as well?

It's not a feature we need anymore for gfx1250. I updated the description

https://github.com/llvm/llvm-project/pull/157639
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Generate canonical additions in AMDGPUPromoteAlloca (PR #157810)

2025-09-12 Thread Fabian Ritter via llvm-branch-commits

https://github.com/ritter-x2a ready_for_review 
https://github.com/llvm/llvm-project/pull/157810
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Generate canonical additions in AMDGPUPromoteAlloca (PR #157810)

2025-09-12 Thread Fabian Ritter via llvm-branch-commits

ritter-x2a wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/157810?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#157810** https://app.graphite.dev/github/pr/llvm/llvm-project/157810?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/157810?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#157682** https://app.graphite.dev/github/pr/llvm/llvm-project/157682?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/157810
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Generate canonical additions in AMDGPUPromoteAlloca (PR #157810)

2025-09-12 Thread Nikita Popov via llvm-branch-commits

https://github.com/nikic approved this pull request.


https://github.com/llvm/llvm-project/pull/157810
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [AMDGPU] Add builtins for wave reduction intrinsics (PR #150170)

2025-09-12 Thread via llvm-branch-commits

https://github.com/easyonaadit updated 
https://github.com/llvm/llvm-project/pull/150170

>From be85e6c0222fe757ac59959bad5c56a85a32b869 Mon Sep 17 00:00:00 2001
From: Aaditya 
Date: Sat, 19 Jul 2025 12:57:27 +0530
Subject: [PATCH] Add builtins for wave reduction intrinsics

---
 clang/include/clang/Basic/BuiltinsAMDGPU.def |  25 ++
 clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp  |  58 +++
 clang/test/CodeGenOpenCL/builtins-amdgcn.cl  | 378 +++
 3 files changed, 461 insertions(+)

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index e5a1422fe8778..56b1a8dc09b15 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -364,6 +364,31 @@ BUILTIN(__builtin_amdgcn_endpgm, "v", "nr")
 BUILTIN(__builtin_amdgcn_get_fpenv, "WUi", "n")
 BUILTIN(__builtin_amdgcn_set_fpenv, "vWUi", "n")
 
+//===--===//
+
+// Wave Reduction builtins.
+
+//===--===//
+
+BUILTIN(__builtin_amdgcn_wave_reduce_add_u32, "ZUiZUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_sub_u32, "ZUiZUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_min_i32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_min_u32, "ZUiZUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_max_i32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_max_u32, "ZUiZUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_and_b32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_or_b32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_xor_b32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_add_u64, "WUiWUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_sub_u64, "WUiWUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_min_i64, "WiWiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_min_u64, "WUiWUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_max_i64, "WiWiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_max_u64, "WUiWUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_and_b64, "WiWiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_or_b64, "WiWiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_xor_b64, "WiWiZi", "nc")
+
 
//===--===//
 // R600-NI only builtins.
 
//===--===//
diff --git a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp 
b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
index 87a46287c4022..07cf08c54985a 100644
--- a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
@@ -295,11 +295,69 @@ void 
CodeGenFunction::AddAMDGPUFenceAddressSpaceMMRA(llvm::Instruction *Inst,
   Inst->setMetadata(LLVMContext::MD_mmra, MMRAMetadata::getMD(Ctx, MMRAs));
 }
 
+static Intrinsic::ID getIntrinsicIDforWaveReduction(unsigned BuiltinID) {
+  switch (BuiltinID) {
+  default:
+llvm_unreachable("Unknown BuiltinID for wave reduction");
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u64:
+return Intrinsic::amdgcn_wave_reduce_add;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u64:
+return Intrinsic::amdgcn_wave_reduce_sub;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_i32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_i64:
+return Intrinsic::amdgcn_wave_reduce_min;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_u32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_u64:
+return Intrinsic::amdgcn_wave_reduce_umin;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_i32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_i64:
+return Intrinsic::amdgcn_wave_reduce_max;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_u32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_u64:
+return Intrinsic::amdgcn_wave_reduce_umax;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_and_b32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_and_b64:
+return Intrinsic::amdgcn_wave_reduce_and;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_or_b32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_or_b64:
+return Intrinsic::amdgcn_wave_reduce_or;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_xor_b32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_xor_b64:
+return Intrinsic::amdgcn_wave_reduce_xor;
+  }
+}
+
 Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
   const CallExpr *E) {
   llvm::AtomicOrdering AO = llvm::AtomicOrdering::SequentiallyConsistent;
   llvm::SyncScope::ID SSID;
   switch (BuiltinID) {
+  case AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u32:
+  case AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u

[llvm-branch-commits] [clang] [clang][LoongArch] Introduce LASX and LSX conversion intrinsics (PR #157819)

2025-09-12 Thread via llvm-branch-commits

https://github.com/heiher created 
https://github.com/llvm/llvm-project/pull/157819

This patch introduces the LASX and LSX conversion intrinsics:

- __m256 __lasx_cast_128_s (__m128)
- __m256d __lasx_cast_128_d (__m128d)
- __m256i __lasx_cast_128 (__m128i)
- __m256 __lasx_concat_128_s (__m128, __m128)
- __m256d __lasx_concat_128_d (__m128, __m128d)
- __m256i __lasx_concat_128 (__m128, __m128i)
- __m128 __lasx_extract_128_lo_s (__m256)
- __m128d __lasx_extract_128_lo_d (__m256d)
- __m128i __lasx_extract_128_lo (__m256i)
- __m128 __lasx_extract_128_hi_s (__m256)
- __m128d __lasx_extract_128_hi_d (__m256d)
- __m128i __lasx_extract_128_hi (__m256i)
- __m256 __lasx_insert_128_lo_s (__m256, __m128)
- __m256d __lasx_insert_128_lo_d (__m256d, __m128d)
- __m256i __lasx_insert_128_lo (__m256i, __m128i)
- __m256 __lasx_insert_128_hi_s (__m256, __m128)
- __m256d __lasx_insert_128_hi_d (__m256d, __m128d)
- __m256i __lasx_insert_128_hi (__m256i, __m128i)

>From 91ca73f8a3ffa1b5e750252984e1a5d8f6097d28 Mon Sep 17 00:00:00 2001
From: WANG Rui 
Date: Wed, 10 Sep 2025 17:11:10 +0800
Subject: [PATCH] [clang][LoongArch] Introduce LASX and LSX conversion
 intrinsics

This patch introduces the LASX and LSX conversion intrinsics:

- __m256 __lasx_cast_128_s (__m128)
- __m256d __lasx_cast_128_d (__m128d)
- __m256i __lasx_cast_128 (__m128i)
- __m256 __lasx_concat_128_s (__m128, __m128)
- __m256d __lasx_concat_128_d (__m128, __m128d)
- __m256i __lasx_concat_128 (__m128, __m128i)
- __m128 __lasx_extract_128_lo_s (__m256)
- __m128d __lasx_extract_128_lo_d (__m256d)
- __m128i __lasx_extract_128_lo (__m256i)
- __m128 __lasx_extract_128_hi_s (__m256)
- __m128d __lasx_extract_128_hi_d (__m256d)
- __m128i __lasx_extract_128_hi (__m256i)
- __m256 __lasx_insert_128_lo_s (__m256, __m128)
- __m256d __lasx_insert_128_lo_d (__m256d, __m128d)
- __m256i __lasx_insert_128_lo (__m256i, __m128i)
- __m256 __lasx_insert_128_hi_s (__m256, __m128)
- __m256d __lasx_insert_128_hi_d (__m256d, __m128d)
- __m256i __lasx_insert_128_hi (__m256i, __m128i)
---
 .../clang/Basic/BuiltinsLoongArchLASX.def |  19 +++
 clang/lib/Headers/lasxintrin.h| 110 
 .../CodeGen/LoongArch/lasx/builtin-alias.c| 153 +
 clang/test/CodeGen/LoongArch/lasx/builtin.c   | 157 ++
 4 files changed, 439 insertions(+)

diff --git a/clang/include/clang/Basic/BuiltinsLoongArchLASX.def 
b/clang/include/clang/Basic/BuiltinsLoongArchLASX.def
index c4ea46a3bc5b5..b234dedad648e 100644
--- a/clang/include/clang/Basic/BuiltinsLoongArchLASX.def
+++ b/clang/include/clang/Basic/BuiltinsLoongArchLASX.def
@@ -986,3 +986,22 @@ TARGET_BUILTIN(__builtin_lasx_xbnz_b, "iV32Uc", "nc", 
"lasx")
 TARGET_BUILTIN(__builtin_lasx_xbnz_h, "iV16Us", "nc", "lasx")
 TARGET_BUILTIN(__builtin_lasx_xbnz_w, "iV8Ui", "nc", "lasx")
 TARGET_BUILTIN(__builtin_lasx_xbnz_d, "iV4ULLi", "nc", "lasx")
+
+TARGET_BUILTIN(__builtin_lasx_cast_128_s, "V8fV4f", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_cast_128_d, "V4dV2d", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_cast_128, "V32ScV16Sc", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_concat_128_s, "V8fV4fV4f", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_concat_128_d, "V4dV2dV2d", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_concat_128, "V32ScV16ScV16Sc", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_extract_128_lo_s, "V4fV8f", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_extract_128_lo_d, "V2dV4d", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_extract_128_lo, "V16ScV32Sc", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_extract_128_hi_s, "V4fV8f", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_extract_128_hi_d, "V2dV4d", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_extract_128_hi, "V16ScV32Sc", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_insert_128_lo_s, "V8fV8fV4f", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_insert_128_lo_d, "V4dV4dV2d", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_insert_128_lo, "V32ScV32ScV16Sc", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_insert_128_hi_s, "V8fV8fV4f", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_insert_128_hi_d, "V4dV4dV2d", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_insert_128_hi, "V32ScV32ScV16Sc", "nc", "lasx")
diff --git a/clang/lib/Headers/lasxintrin.h b/clang/lib/Headers/lasxintrin.h
index 85020d82829e2..6dd8ac24ed46d 100644
--- a/clang/lib/Headers/lasxintrin.h
+++ b/clang/lib/Headers/lasxintrin.h
@@ -10,6 +10,8 @@
 #ifndef _LOONGSON_ASXINTRIN_H
 #define _LOONGSON_ASXINTRIN_H 1
 
+#include 
+
 #if defined(__loongarch_asx)
 
 typedef signed char v32i8 __attribute__((vector_size(32), aligned(32)));
@@ -3882,5 +3884,113 @@ extern __inline
 
 #define __lasx_xvrepli_w(/*si10*/ _1) ((__m256i)__builtin_lasx_xvrepli_w((_1)))
 
+extern __inline
+__attribute__((__gnu_inline__, __always_inline__, __artificial__)) __m256
+__lasx_cast_128_s(__m128 _1) {
+  return (__m256)__builtin_lasx_cast_128_s((v4f32)_1);
+}
+
+extern __inline
+__attribute__((__gnu_inline__, __always_in

[llvm-branch-commits] [clang] [clang][LoongArch] Introduce LASX and LSX conversion intrinsics (PR #157819)

2025-09-12 Thread via llvm-branch-commits

github-actions[bot] wrote:




:warning: C/C++ code formatter, clang-format found issues in your code. 
:warning:



You can test this locally with the following command:


``bash
git-clang-format --diff origin/main HEAD --extensions h,c -- 
clang/lib/Headers/lasxintrin.h 
clang/test/CodeGen/LoongArch/lasx/builtin-alias.c 
clang/test/CodeGen/LoongArch/lasx/builtin.c
``

:warning:
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing `origin/main` to the base branch/commit you want to compare against.
:warning:





View the diff from clang-format here.


``diff
diff --git a/clang/lib/Headers/lasxintrin.h b/clang/lib/Headers/lasxintrin.h
index 6dd8ac24e..417671ffd 100644
--- a/clang/lib/Headers/lasxintrin.h
+++ b/clang/lib/Headers/lasxintrin.h
@@ -3885,8 +3885,8 @@ extern __inline
 #define __lasx_xvrepli_w(/*si10*/ _1) ((__m256i)__builtin_lasx_xvrepli_w((_1)))
 
 extern __inline
-__attribute__((__gnu_inline__, __always_inline__, __artificial__)) __m256
-__lasx_cast_128_s(__m128 _1) {
+__attribute__((__gnu_inline__, __always_inline__,
+   __artificial__)) __m256 __lasx_cast_128_s(__m128 _1) {
   return (__m256)__builtin_lasx_cast_128_s((v4f32)_1);
 }
 

``




https://github.com/llvm/llvm-project/pull/157819
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [AMDGPU] Add builtins for wave reduction intrinsics (PR #150170)

2025-09-12 Thread via llvm-branch-commits

https://github.com/easyonaadit updated 
https://github.com/llvm/llvm-project/pull/150170

>From 308545da2b700e93d2c4b5e32c8392468385 Mon Sep 17 00:00:00 2001
From: Aaditya 
Date: Sat, 19 Jul 2025 12:57:27 +0530
Subject: [PATCH] Add builtins for wave reduction intrinsics

---
 clang/include/clang/Basic/BuiltinsAMDGPU.def |  25 ++
 clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp  |  58 +++
 clang/test/CodeGenOpenCL/builtins-amdgcn.cl  | 378 +++
 3 files changed, 461 insertions(+)

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index e5a1422fe8778..56b1a8dc09b15 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -364,6 +364,31 @@ BUILTIN(__builtin_amdgcn_endpgm, "v", "nr")
 BUILTIN(__builtin_amdgcn_get_fpenv, "WUi", "n")
 BUILTIN(__builtin_amdgcn_set_fpenv, "vWUi", "n")
 
+//===--===//
+
+// Wave Reduction builtins.
+
+//===--===//
+
+BUILTIN(__builtin_amdgcn_wave_reduce_add_u32, "ZUiZUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_sub_u32, "ZUiZUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_min_i32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_min_u32, "ZUiZUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_max_i32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_max_u32, "ZUiZUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_and_b32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_or_b32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_xor_b32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_add_u64, "WUiWUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_sub_u64, "WUiWUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_min_i64, "WiWiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_min_u64, "WUiWUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_max_i64, "WiWiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_max_u64, "WUiWUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_and_b64, "WiWiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_or_b64, "WiWiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_xor_b64, "WiWiZi", "nc")
+
 
//===--===//
 // R600-NI only builtins.
 
//===--===//
diff --git a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp 
b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
index 87a46287c4022..07cf08c54985a 100644
--- a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
@@ -295,11 +295,69 @@ void 
CodeGenFunction::AddAMDGPUFenceAddressSpaceMMRA(llvm::Instruction *Inst,
   Inst->setMetadata(LLVMContext::MD_mmra, MMRAMetadata::getMD(Ctx, MMRAs));
 }
 
+static Intrinsic::ID getIntrinsicIDforWaveReduction(unsigned BuiltinID) {
+  switch (BuiltinID) {
+  default:
+llvm_unreachable("Unknown BuiltinID for wave reduction");
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u64:
+return Intrinsic::amdgcn_wave_reduce_add;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u64:
+return Intrinsic::amdgcn_wave_reduce_sub;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_i32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_i64:
+return Intrinsic::amdgcn_wave_reduce_min;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_u32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_u64:
+return Intrinsic::amdgcn_wave_reduce_umin;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_i32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_i64:
+return Intrinsic::amdgcn_wave_reduce_max;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_u32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_u64:
+return Intrinsic::amdgcn_wave_reduce_umax;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_and_b32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_and_b64:
+return Intrinsic::amdgcn_wave_reduce_and;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_or_b32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_or_b64:
+return Intrinsic::amdgcn_wave_reduce_or;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_xor_b32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_xor_b64:
+return Intrinsic::amdgcn_wave_reduce_xor;
+  }
+}
+
 Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
   const CallExpr *E) {
   llvm::AtomicOrdering AO = llvm::AtomicOrdering::SequentiallyConsistent;
   llvm::SyncScope::ID SSID;
   switch (BuiltinID) {
+  case AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u32:
+  case AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u

[llvm-branch-commits] [llvm] [AllocToken, Clang] Implement __builtin_alloc_token_infer() and llvm.alloc.token.id (PR #156842)

2025-09-12 Thread Oliver Hunt via llvm-branch-commits


@@ -3352,10 +3352,15 @@ class CodeGenFunction : public CodeGenTypeCache {
   SanitizerAnnotateDebugInfo(ArrayRef 
Ordinals,
  SanitizerHandler Handler);
 
-  /// Emit additional metadata used by the AllocToken instrumentation.
+  /// Emit metadata used by the AllocToken instrumentation.
+  llvm::MDNode *EmitAllocTokenHint(QualType AllocType);

ojhunt wrote:

I think this should be something like `BuildAllocTokenHint` -- also does llvm 
permit multiple nodes to share this hint? 

This is basically a "can this be cached and reused?" question - For TMO we 
needed to cache Type->descriptor to avoid any compile time regression - though 
I guess the TMO descriptors are more expensive to produce.

https://github.com/llvm/llvm-project/pull/156842
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [AMDGPU] Add builtins for wave reduction intrinsics (PR #150170)

2025-09-12 Thread via llvm-branch-commits

easyonaadit wrote:

### Merge activity

* **Sep 10, 10:47 AM UTC**: A user started a stack merge that includes this 
pull request via 
[Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/150170).


https://github.com/llvm/llvm-project/pull/150170
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Extending wave reduction intrinsics for `i64` types - 2 (PR #151309)

2025-09-12 Thread via llvm-branch-commits

easyonaadit wrote:

### Merge activity

* **Sep 10, 10:47 AM UTC**: A user started a stack merge that includes this 
pull request via 
[Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/151309).


https://github.com/llvm/llvm-project/pull/151309
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Extending wave reduction intrinsics for `i64` types - 3 (PR #151310)

2025-09-12 Thread via llvm-branch-commits

easyonaadit wrote:

### Merge activity

* **Sep 10, 10:47 AM UTC**: A user started a stack merge that includes this 
pull request via 
[Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/151310).


https://github.com/llvm/llvm-project/pull/151310
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [AMDGPU] Add builtins for wave reduction intrinsics (PR #150170)

2025-09-12 Thread via llvm-branch-commits

https://github.com/easyonaadit updated 
https://github.com/llvm/llvm-project/pull/150170

>From 207c0b3f427403f0e504f9631f9d7523aecdb0a8 Mon Sep 17 00:00:00 2001
From: Aaditya 
Date: Sat, 19 Jul 2025 12:57:27 +0530
Subject: [PATCH] Add builtins for wave reduction intrinsics

---
 clang/include/clang/Basic/BuiltinsAMDGPU.def |  25 ++
 clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp  |  58 +++
 clang/test/CodeGenOpenCL/builtins-amdgcn.cl  | 378 +++
 3 files changed, 461 insertions(+)

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index e5a1422fe8778..56b1a8dc09b15 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -364,6 +364,31 @@ BUILTIN(__builtin_amdgcn_endpgm, "v", "nr")
 BUILTIN(__builtin_amdgcn_get_fpenv, "WUi", "n")
 BUILTIN(__builtin_amdgcn_set_fpenv, "vWUi", "n")
 
+//===--===//
+
+// Wave Reduction builtins.
+
+//===--===//
+
+BUILTIN(__builtin_amdgcn_wave_reduce_add_u32, "ZUiZUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_sub_u32, "ZUiZUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_min_i32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_min_u32, "ZUiZUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_max_i32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_max_u32, "ZUiZUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_and_b32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_or_b32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_xor_b32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_add_u64, "WUiWUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_sub_u64, "WUiWUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_min_i64, "WiWiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_min_u64, "WUiWUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_max_i64, "WiWiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_max_u64, "WUiWUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_and_b64, "WiWiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_or_b64, "WiWiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_xor_b64, "WiWiZi", "nc")
+
 
//===--===//
 // R600-NI only builtins.
 
//===--===//
diff --git a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp 
b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
index 87a46287c4022..07cf08c54985a 100644
--- a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
@@ -295,11 +295,69 @@ void 
CodeGenFunction::AddAMDGPUFenceAddressSpaceMMRA(llvm::Instruction *Inst,
   Inst->setMetadata(LLVMContext::MD_mmra, MMRAMetadata::getMD(Ctx, MMRAs));
 }
 
+static Intrinsic::ID getIntrinsicIDforWaveReduction(unsigned BuiltinID) {
+  switch (BuiltinID) {
+  default:
+llvm_unreachable("Unknown BuiltinID for wave reduction");
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u64:
+return Intrinsic::amdgcn_wave_reduce_add;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u64:
+return Intrinsic::amdgcn_wave_reduce_sub;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_i32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_i64:
+return Intrinsic::amdgcn_wave_reduce_min;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_u32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_u64:
+return Intrinsic::amdgcn_wave_reduce_umin;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_i32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_i64:
+return Intrinsic::amdgcn_wave_reduce_max;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_u32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_u64:
+return Intrinsic::amdgcn_wave_reduce_umax;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_and_b32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_and_b64:
+return Intrinsic::amdgcn_wave_reduce_and;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_or_b32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_or_b64:
+return Intrinsic::amdgcn_wave_reduce_or;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_xor_b32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_xor_b64:
+return Intrinsic::amdgcn_wave_reduce_xor;
+  }
+}
+
 Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
   const CallExpr *E) {
   llvm::AtomicOrdering AO = llvm::AtomicOrdering::SequentiallyConsistent;
   llvm::SyncScope::ID SSID;
   switch (BuiltinID) {
+  case AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u32:
+  case AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u

[llvm-branch-commits] [llvm] [AMDGPU] Propagate Constants for Wave Reduction Intrinsics (PR #150395)

2025-09-12 Thread via llvm-branch-commits

easyonaadit wrote:

### Merge activity

* **Sep 10, 10:47 AM UTC**: A user started a stack merge that includes this 
pull request via 
[Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/150395).


https://github.com/llvm/llvm-project/pull/150395
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][gfx1250] Remove SCOPE_SE for scratch stores (PR #157640)

2025-09-12 Thread Pierre van Houtryve via llvm-branch-commits

Pierre-vh wrote:

### Merge activity

* **Sep 10, 8:16 AM UTC**: A user started a stack merge that includes this pull 
request via 
[Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/157640).


https://github.com/llvm/llvm-project/pull/157640
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/UniformityAnalysis: fix G_ZEXTLOAD and G_SEXTLOAD (PR #157845)

2025-09-12 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Petar Avramovic (petar-avramovic)


Changes

Use same rules for G_ZEXTLOAD and G_SEXTLOAD as for G_LOAD.
Flat addrspace(0) and private addrspace(5) G_ZEXTLOAD and G_SEXTLOAD
should be always divergent.

---
Full diff: https://github.com/llvm/llvm-project/pull/157845.diff


2 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.cpp (+8-7) 
- (modified) llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/loads-gmir.mir 
(+12-8) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 5c958dfe6954f..398c99b3bd127 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -10281,7 +10281,7 @@ unsigned SIInstrInfo::getInstrLatency(const 
InstrItineraryData *ItinData,
 InstructionUniformity
 SIInstrInfo::getGenericInstructionUniformity(const MachineInstr &MI) const {
   const MachineRegisterInfo &MRI = MI.getMF()->getRegInfo();
-  unsigned opcode = MI.getOpcode();
+  unsigned Opcode = MI.getOpcode();
 
   auto HandleAddrSpaceCast = [this, &MRI](const MachineInstr &MI) {
 Register Dst = MI.getOperand(0).getReg();
@@ -10301,7 +10301,7 @@ SIInstrInfo::getGenericInstructionUniformity(const 
MachineInstr &MI) const {
   // If the target supports globally addressable scratch, the mapping from
   // scratch memory to the flat aperture changes therefore an address space 
cast
   // is no longer uniform.
-  if (opcode == TargetOpcode::G_ADDRSPACE_CAST)
+  if (Opcode == TargetOpcode::G_ADDRSPACE_CAST)
 return HandleAddrSpaceCast(MI);
 
   if (auto *GI = dyn_cast(&MI)) {
@@ -10329,7 +10329,8 @@ SIInstrInfo::getGenericInstructionUniformity(const 
MachineInstr &MI) const {
   //
   // All other loads are not divergent, because if threads issue loads with the
   // same arguments, they will always get the same result.
-  if (opcode == AMDGPU::G_LOAD) {
+  if (Opcode == AMDGPU::G_LOAD || Opcode == AMDGPU::G_ZEXTLOAD ||
+  Opcode == AMDGPU::G_SEXTLOAD) {
 if (MI.memoperands_empty())
   return InstructionUniformity::NeverUniform; // conservative assumption
 
@@ -10343,10 +10344,10 @@ SIInstrInfo::getGenericInstructionUniformity(const 
MachineInstr &MI) const {
 return InstructionUniformity::Default;
   }
 
-  if (SIInstrInfo::isGenericAtomicRMWOpcode(opcode) ||
-  opcode == AMDGPU::G_ATOMIC_CMPXCHG ||
-  opcode == AMDGPU::G_ATOMIC_CMPXCHG_WITH_SUCCESS ||
-  AMDGPU::isGenericAtomic(opcode)) {
+  if (SIInstrInfo::isGenericAtomicRMWOpcode(Opcode) ||
+  Opcode == AMDGPU::G_ATOMIC_CMPXCHG ||
+  Opcode == AMDGPU::G_ATOMIC_CMPXCHG_WITH_SUCCESS ||
+  AMDGPU::isGenericAtomic(Opcode)) {
 return InstructionUniformity::NeverUniform;
   }
   return InstructionUniformity::Default;
diff --git a/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/loads-gmir.mir 
b/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/loads-gmir.mir
index cb3c2de5b8753..d799cd2057f47 100644
--- a/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/loads-gmir.mir
+++ b/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/loads-gmir.mir
@@ -46,13 +46,13 @@ body: |
 %6:_(p5) = G_IMPLICIT_DEF
 
 ; Atomic load
-; CHECK-NOT: DIVERGENT
-
+; CHECK: DIVERGENT
+; CHECK-SAME: G_ZEXTLOAD
 %0:_(s32) = G_ZEXTLOAD %1(p0) :: (load seq_cst (s16) from `ptr undef`)
 
 ; flat load
-; CHECK-NOT: DIVERGENT
-
+; CHECK: DIVERGENT
+; CHECK-SAME: G_ZEXTLOAD
 %2:_(s32) = G_ZEXTLOAD %1(p0) :: (load (s16) from `ptr undef`)
 
 ; Gloabal load
@@ -60,7 +60,8 @@ body: |
 %3:_(s32) = G_ZEXTLOAD %4(p1) :: (load (s16) from `ptr addrspace(1) 
undef`, addrspace 1)
 
 ; Private load
-; CHECK-NOT: DIVERGENT
+; CHECK: DIVERGENT
+; CHECK-SAME: G_ZEXTLOAD
 %5:_(s32) = G_ZEXTLOAD %6(p5) :: (volatile load (s16) from `ptr 
addrspace(5) undef`, addrspace 5)
 G_STORE %2(s32), %4(p1) :: (volatile store (s32) into `ptr addrspace(1) 
undef`, addrspace 1)
 G_STORE %3(s32), %4(p1) :: (volatile store (s32) into `ptr addrspace(1) 
undef`, addrspace 1)
@@ -80,11 +81,13 @@ body: |
 %6:_(p5) = G_IMPLICIT_DEF
 
 ; Atomic load
-; CHECK-NOT: DIVERGENT
+; CHECK: DIVERGENT
+; CHECK-SAME: G_SEXTLOAD
 %0:_(s32) = G_SEXTLOAD %1(p0) :: (load seq_cst (s16) from `ptr undef`)
 
 ; flat load
-; CHECK-NOT: DIVERGENT
+; CHECK: DIVERGENT
+; CHECK-SAME: G_SEXTLOAD
 %2:_(s32) = G_SEXTLOAD %1(p0) :: (load (s16) from `ptr undef`)
 
 ; Gloabal load
@@ -92,7 +95,8 @@ body: |
 %3:_(s32) = G_SEXTLOAD %4(p1) :: (load (s16) from `ptr addrspace(1) 
undef`, addrspace 1)
 
 ; Private load
-; CHECK-NOT: DIVERGENT
+; CHECK: DIVERGENT
+; CHECK-SAME: G_SEXTLOAD
 %5:_(s32) = G_SEXTLOAD %6(p5) :: (volatile load (s16) from `ptr 
addrspace(5) undef`, addrspace 5)
 G_STORE %2(s32), %4(p1) :: (volatile store (s32) into `ptr addrspace(1) 
undef`, ad

[llvm-branch-commits] [llvm] [Remarks] Remove redundant size from StringRefs (NFC) (PR #156357)

2025-09-12 Thread Tobias Stadler via llvm-branch-commits

https://github.com/tobias-stadler updated 
https://github.com/llvm/llvm-project/pull/156357

>From e3951bca5a4a5c169975f13faa679a761455976a Mon Sep 17 00:00:00 2001
From: Tobias Stadler 
Date: Mon, 1 Sep 2025 19:02:32 +0100
Subject: [PATCH] fix format

Created using spr 1.3.7-wip
---
 llvm/include/llvm/Remarks/BitstreamRemarkContainer.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/llvm/include/llvm/Remarks/BitstreamRemarkContainer.h 
b/llvm/include/llvm/Remarks/BitstreamRemarkContainer.h
index 2e378fd755588..48a148a3adc13 100644
--- a/llvm/include/llvm/Remarks/BitstreamRemarkContainer.h
+++ b/llvm/include/llvm/Remarks/BitstreamRemarkContainer.h
@@ -96,7 +96,8 @@ constexpr StringLiteral MetaExternalFileName("External File");
 constexpr StringLiteral RemarkHeaderName("Remark header");
 constexpr StringLiteral RemarkDebugLocName("Remark debug location");
 constexpr StringLiteral RemarkHotnessName("Remark hotness");
-constexpr StringLiteral RemarkArgWithDebugLocName("Argument with debug 
location");
+constexpr StringLiteral
+RemarkArgWithDebugLocName("Argument with debug location");
 constexpr StringLiteral RemarkArgWithoutDebugLocName("Argument");
 
 } // end namespace remarks

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] Prepare libcxx and libcxxabi for pointer field protection. (PR #151651)

2025-09-12 Thread Nikolas Klauser via llvm-branch-commits


@@ -26,6 +26,12 @@
 #  include 
 #endif
 
+#if defined(__POINTER_FIELD_PROTECTION__)
+constexpr bool pfp_disabled = false;
+#else
+constexpr bool pfp_disabled = true;
+#endif

philnik777 wrote:

Again, can we just disable the test instead?

https://github.com/llvm/llvm-project/pull/151651
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] Prepare libcxx and libcxxabi for pointer field protection. (PR #151651)

2025-09-12 Thread Nikolas Klauser via llvm-branch-commits


@@ -1262,6 +1275,14 @@ typedef __char32_t char32_t;
 #define _LIBCPP_HAS_EXPLICIT_THIS_PARAMETER 0
 #  endif
 
+#  if defined(__POINTER_FIELD_PROTECTION__)
+#define _LIBCPP_PFP [[clang::pointer_field_protection]]
+#define _LIBCPP_NO_PFP [[clang::no_field_protection]]

philnik777 wrote:

These should be _Uglified. Do these attributes do anything with pfp disabled? 
If no, why not simply check for their availability like with other attributes?

https://github.com/llvm/llvm-project/pull/151651
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] Prepare libcxx and libcxxabi for pointer field protection. (PR #151651)

2025-09-12 Thread Nikolas Klauser via llvm-branch-commits


@@ -484,8 +484,21 @@ typedef __char32_t char32_t;
 #define _LIBCPP_EXCEPTIONS_SIG e
 #  endif
 
+#  if !_LIBCPP_HAS_EXCEPTIONS
+#define _LIBCPP_EXCEPTIONS_SIG n
+#  else
+#define _LIBCPP_EXCEPTIONS_SIG e
+#  endif
+
+#  if defined(__POINTER_FIELD_PROTECTION__)
+#define _LIBCPP_PFP_SIG p
+#  else
+#define _LIBCPP_PFP_SIG
+#  endif

philnik777 wrote:

My understanding is that pfp changes the layout of certain types? Why should 
there be an ABI tag for it?

https://github.com/llvm/llvm-project/pull/151651
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libc++] Add build and CI support for pointer field protection (PR #152414)

2025-09-12 Thread Nikolas Klauser via llvm-branch-commits


@@ -411,6 +411,42 @@ bootstrapping-build)
 
 ccache -s
 ;;
+bootstrapping-build-pfp)

philnik777 wrote:

A bootstrapping build is incredibly heavy weight. Why is this required?

https://github.com/llvm/llvm-project/pull/152414
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [mlir] [flang][OpenMP] Support multi-block reduction combiner regions on the GPU (PR #156837)

2025-09-12 Thread Abid Qadeer via llvm-branch-commits


@@ -3750,6 +3752,7 @@ OpenMPIRBuilder::InsertPointOrErrorTy 
OpenMPIRBuilder::createReductionsGPU(
   RI.ReductionGen(Builder.saveIP(), RHSValue, LHSValue, Reduced);
   if (!AfterIP)
 return AfterIP.takeError();
+  Builder.SetInsertPoint(AfterIP->getBlock(), AfterIP->getPoint());

abidh wrote:

```suggestion
  Builder.restoreIP(*AfterIP);
```

https://github.com/llvm/llvm-project/pull/156837
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [compiler-rt] Backport AArch64 sanitizer fixes to 21.x. (PR #157848)

2025-09-12 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-compiler-rt-sanitizer

Author: Michał Górny (mgorny)


Changes

Backport the following commits to 21.x branch:
- 19cfc30
- 4485a3f
- 6beb6f3

---
Full diff: https://github.com/llvm/llvm-project/pull/157848.diff


11 Files Affected:

- (modified) compiler-rt/lib/gwp_asan/tests/basic.cpp (+6-5) 
- (modified) compiler-rt/lib/gwp_asan/tests/never_allocated.cpp (+6-4) 
- (modified) compiler-rt/test/asan/TestCases/Linux/release_to_os_test.cpp (+1) 
- (modified) compiler-rt/test/cfi/cross-dso/lit.local.cfg.py (+4) 
- (modified) compiler-rt/test/dfsan/atomic.cpp (+5-2) 
- (modified) compiler-rt/test/lit.common.cfg.py (+17) 
- (modified) compiler-rt/test/msan/dtls_test.c (+1) 
- (modified) 
compiler-rt/test/sanitizer_common/TestCases/Linux/odd_stack_size.cpp (+1) 
- (modified) 
compiler-rt/test/sanitizer_common/TestCases/Linux/release_to_os_test.cpp (+3) 
- (modified) 
compiler-rt/test/sanitizer_common/TestCases/Linux/resize_tls_dynamic.cpp (+3) 
- (modified) compiler-rt/test/sanitizer_common/TestCases/Linux/tls_get_addr.c 
(+3) 


``diff
diff --git a/compiler-rt/lib/gwp_asan/tests/basic.cpp 
b/compiler-rt/lib/gwp_asan/tests/basic.cpp
index 88e7ed14a5c2f..7d36a2ee1f947 100644
--- a/compiler-rt/lib/gwp_asan/tests/basic.cpp
+++ b/compiler-rt/lib/gwp_asan/tests/basic.cpp
@@ -65,11 +65,12 @@ TEST_F(DefaultGuardedPoolAllocator, NonPowerOfTwoAlignment) 
{
 
 // Added multi-page slots? You'll need to expand this test.
 TEST_F(DefaultGuardedPoolAllocator, TooBigForSinglePageSlots) {
-  EXPECT_EQ(nullptr, GPA.allocate(0x1001, 0));
-  EXPECT_EQ(nullptr, GPA.allocate(0x1001, 1));
-  EXPECT_EQ(nullptr, GPA.allocate(0x1001, 0x1000));
-  EXPECT_EQ(nullptr, GPA.allocate(1, 0x2000));
-  EXPECT_EQ(nullptr, GPA.allocate(0, 0x2000));
+  size_t PageSize = sysconf(_SC_PAGESIZE);
+  EXPECT_EQ(nullptr, GPA.allocate(PageSize + 1, 0));
+  EXPECT_EQ(nullptr, GPA.allocate(PageSize + 1, 1));
+  EXPECT_EQ(nullptr, GPA.allocate(PageSize + 1, PageSize));
+  EXPECT_EQ(nullptr, GPA.allocate(1, 2 * PageSize));
+  EXPECT_EQ(nullptr, GPA.allocate(0, 2 * PageSize));
 }
 
 TEST_F(CustomGuardedPoolAllocator, AllocAllSlots) {
diff --git a/compiler-rt/lib/gwp_asan/tests/never_allocated.cpp 
b/compiler-rt/lib/gwp_asan/tests/never_allocated.cpp
index 2f695b4379861..37a4b384e4ac0 100644
--- a/compiler-rt/lib/gwp_asan/tests/never_allocated.cpp
+++ b/compiler-rt/lib/gwp_asan/tests/never_allocated.cpp
@@ -13,8 +13,10 @@
 #include "gwp_asan/tests/harness.h"
 
 TEST_P(BacktraceGuardedPoolAllocatorDeathTest, NeverAllocated) {
+  size_t PageSize = sysconf(_SC_PAGESIZE);
+
   SCOPED_TRACE("");
-  void *Ptr = GPA.allocate(0x1000);
+  void *Ptr = GPA.allocate(PageSize);
   GPA.deallocate(Ptr);
 
   std::string DeathNeedle =
@@ -23,7 +25,7 @@ TEST_P(BacktraceGuardedPoolAllocatorDeathTest, 
NeverAllocated) {
   // Trigger a guard page in a completely different slot that's never 
allocated.
   // Previously, there was a bug that this would result in nullptr-dereference
   // in the posix crash handler.
-  char *volatile NeverAllocatedPtr = static_cast(Ptr) + 0x3000;
+  char *volatile NeverAllocatedPtr = static_cast(Ptr) + 3 * PageSize;
   if (!Recoverable) {
 EXPECT_DEATH(*NeverAllocatedPtr = 0, DeathNeedle);
 return;
@@ -37,8 +39,8 @@ TEST_P(BacktraceGuardedPoolAllocatorDeathTest, 
NeverAllocated) {
   GetOutputBuffer().clear();
   for (size_t i = 0; i < 100; ++i) {
 *NeverAllocatedPtr = 0;
-*(NeverAllocatedPtr + 0x2000) = 0;
-*(NeverAllocatedPtr + 0x3000) = 0;
+*(NeverAllocatedPtr + 2 * PageSize) = 0;
+*(NeverAllocatedPtr + 3 * PageSize) = 0;
 ASSERT_TRUE(GetOutputBuffer().empty());
   }
 
diff --git a/compiler-rt/test/asan/TestCases/Linux/release_to_os_test.cpp 
b/compiler-rt/test/asan/TestCases/Linux/release_to_os_test.cpp
index 3e28ffde46ab6..dc3ead9e8436c 100644
--- a/compiler-rt/test/asan/TestCases/Linux/release_to_os_test.cpp
+++ b/compiler-rt/test/asan/TestCases/Linux/release_to_os_test.cpp
@@ -6,6 +6,7 @@
 // RUN: %env_asan_opts=allocator_release_to_os_interval_ms=-1 %run %t force 
2>&1 | FileCheck %s --check-prefix=FORCE_RELEASE
 
 // REQUIRES: x86_64-target-arch
+// REQUIRES: page-size-4096
 
 #include 
 #include 
diff --git a/compiler-rt/test/cfi/cross-dso/lit.local.cfg.py 
b/compiler-rt/test/cfi/cross-dso/lit.local.cfg.py
index dceb7cde7218b..5f5486af3779f 100644
--- a/compiler-rt/test/cfi/cross-dso/lit.local.cfg.py
+++ b/compiler-rt/test/cfi/cross-dso/lit.local.cfg.py
@@ -12,3 +12,7 @@ def getRoot(config):
 # Android O (API level 26) has support for cross-dso cfi in libdl.so.
 if config.android and "android-26" not in config.available_features:
 config.unsupported = True
+
+# The runtime library only supports 4K pages.
+if "page-size-4096" not in config.available_features:
+config.unsupported = True
diff --git a/compiler-rt/test/dfsan/atomic.cpp 
b/compiler-rt/test/dfsan/atomic.cpp
index 22ee323c752f8..73e1cbd17a7cd 100644
--- a/compiler-rt/test/dfsan/atomic.cpp
+++ b/

[llvm-branch-commits] [llvm] AMDGPU/UniformityAnalysis: fix G_ZEXTLOAD and G_SEXTLOAD (PR #157845)

2025-09-12 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh approved this pull request.


https://github.com/llvm/llvm-project/pull/157845
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libcxx] [libc++] Add ABI flag to make __tree nodes more compact (PR #147681)

2025-09-12 Thread Louis Dionne via llvm-branch-commits


@@ -98,6 +99,8 @@
 #  endif
 #endif
 
+#define _LIBCPP_ABI_TREE_POINTER_INT_PAIR

ldionne wrote:

Let's add some documentation for this. Also (or only?) in the `.rst` docs?

https://github.com/llvm/llvm-project/pull/147681
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [flang][do concurent] Add saxpy offload tests for OpenMP mapping (PR #155993)

2025-09-12 Thread Kareem Ergawy via llvm-branch-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/155993

>From e36db5923f8122cc56a99461b3e0030e06071a5d Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Fri, 29 Aug 2025 04:04:07 -0500
Subject: [PATCH] [flang][do concurent] Add saxpy offload tests for OpenMP
 mapping

Adds end-to-end tests for `do concurrent` offloading to the device.
---
 .../fortran/do-concurrent-to-omp-saxpy-2d.f90 | 53 +++
 .../fortran/do-concurrent-to-omp-saxpy.f90| 53 +++
 2 files changed, 106 insertions(+)
 create mode 100644 
offload/test/offloading/fortran/do-concurrent-to-omp-saxpy-2d.f90
 create mode 100644 
offload/test/offloading/fortran/do-concurrent-to-omp-saxpy.f90

diff --git a/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy-2d.f90 
b/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy-2d.f90
new file mode 100644
index 0..c6f576acb90b6
--- /dev/null
+++ b/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy-2d.f90
@@ -0,0 +1,53 @@
+! REQUIRES: flang, amdgpu
+
+! RUN: %libomptarget-compile-fortran-generic -fdo-concurrent-to-openmp=device
+! RUN: env LIBOMPTARGET_INFO=16 %libomptarget-run-generic 2>&1 | 
%fcheck-generic
+module saxpymod
+   use iso_fortran_env
+   public :: saxpy
+contains
+
+subroutine saxpy(a, x, y, n, m)
+   use iso_fortran_env
+   implicit none
+   integer,intent(in) :: n, m
+   real(kind=real32),intent(in) :: a
+   real(kind=real32), dimension(:,:),intent(in) :: x
+   real(kind=real32), dimension(:,:),intent(inout) :: y
+   integer :: i, j
+
+   do concurrent(i=1:n, j=1:m)
+   y(i,j) = a * x(i,j) + y(i,j)
+   end do
+
+   write(*,*) "plausibility check:"
+   write(*,'("y(1,1) ",f8.6)') y(1,1)
+   write(*,'("y(n,m) ",f8.6)') y(n,m)
+end subroutine saxpy
+
+end module saxpymod
+
+program main
+   use iso_fortran_env
+   use saxpymod, ONLY:saxpy
+   implicit none
+
+   integer,parameter :: n = 1000, m=1
+   real(kind=real32), allocatable, dimension(:,:) :: x, y
+   real(kind=real32) :: a
+   integer :: i
+
+   allocate(x(1:n,1:m), y(1:n,1:m))
+   a = 2.0_real32
+   x(:,:) = 1.0_real32
+   y(:,:) = 2.0_real32
+
+   call saxpy(a, x, y, n, m)
+
+   deallocate(x,y)
+end program main
+
+! CHECK:  "PluginInterface" device {{[0-9]+}} info: Launching kernel {{.*}}
+! CHECK:  plausibility check:
+! CHECK:  y(1,1) 4.0
+! CHECK:  y(n,m) 4.0
diff --git a/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy.f90 
b/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy.f90
new file mode 100644
index 0..e094a1d7459ef
--- /dev/null
+++ b/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy.f90
@@ -0,0 +1,53 @@
+! REQUIRES: flang, amdgpu
+
+! RUN: %libomptarget-compile-fortran-generic -fdo-concurrent-to-openmp=device
+! RUN: env LIBOMPTARGET_INFO=16 %libomptarget-run-generic 2>&1 | 
%fcheck-generic
+module saxpymod
+   use iso_fortran_env
+   public :: saxpy
+contains
+
+subroutine saxpy(a, x, y, n)
+   use iso_fortran_env
+   implicit none
+   integer,intent(in) :: n
+   real(kind=real32),intent(in) :: a
+   real(kind=real32), dimension(:),intent(in) :: x
+   real(kind=real32), dimension(:),intent(inout) :: y
+   integer :: i
+
+   do concurrent(i=1:n)
+   y(i) = a * x(i) + y(i)
+   end do
+
+   write(*,*) "plausibility check:"
+   write(*,'("y(1) ",f8.6)') y(1)
+   write(*,'("y(n) ",f8.6)') y(n)
+end subroutine saxpy
+
+end module saxpymod
+
+program main
+   use iso_fortran_env
+   use saxpymod, ONLY:saxpy
+   implicit none
+
+   integer,parameter :: n = 1000
+   real(kind=real32), allocatable, dimension(:) :: x, y
+   real(kind=real32) :: a
+   integer :: i
+
+   allocate(x(1:n), y(1:n))
+   a = 2.0_real32
+   x(:) = 1.0_real32
+   y(:) = 2.0_real32
+
+   call saxpy(a, x, y, n)
+
+   deallocate(x,y)
+end program main
+
+! CHECK:  "PluginInterface" device {{[0-9]+}} info: Launching kernel {{.*}}
+! CHECK:  plausibility check:
+! CHECK:  y(1) 4.0
+! CHECK:  y(n) 4.0

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [mlir] [flang][OpenMP] Support multi-block reduction combiner regions on the GPU (PR #156837)

2025-09-12 Thread Kareem Ergawy via llvm-branch-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/156837

>From c7d655214b726335a36eb0a9449b5d14df3699e9 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Thu, 4 Sep 2025 01:06:21 -0500
Subject: [PATCH] [flang][OpenMP] Support multi-block reduction combiner 
 regions on the GPU

Fixes a bug related to insertion points when inlining multi-block
combiner reduction regions. The IP at the end of the inlined region was
not used resulting in emitting BBs with multiple terminators.
---
 llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp |  3 +
 .../omptarget-multi-block-reduction.mlir  | 85 +++
 2 files changed, 88 insertions(+)
 create mode 100644 mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir

diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp 
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index 3d5e487c8990f..fe00a2a5696dc 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -3506,6 +3506,8 @@ Expected 
OpenMPIRBuilder::createReductionFunction(
 return AfterIP.takeError();
   if (!Builder.GetInsertBlock())
 return ReductionFunc;
+
+  Builder.SetInsertPoint(AfterIP->getBlock(), AfterIP->getPoint());
   Builder.CreateStore(Reduced, LHSPtr);
 }
   }
@@ -3750,6 +3752,7 @@ OpenMPIRBuilder::InsertPointOrErrorTy 
OpenMPIRBuilder::createReductionsGPU(
   RI.ReductionGen(Builder.saveIP(), RHSValue, LHSValue, Reduced);
   if (!AfterIP)
 return AfterIP.takeError();
+  Builder.SetInsertPoint(AfterIP->getBlock(), AfterIP->getPoint());
   Builder.CreateStore(Reduced, LHS, false);
 }
   }
diff --git a/mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir 
b/mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir
new file mode 100644
index 0..aaf06d2d0e0c2
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir
@@ -0,0 +1,85 @@
+// RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s
+
+// Verifies that the IR builder can handle reductions with multi-block combiner
+// regions on the GPU.
+
+module attributes {dlti.dl_spec = #dlti.dl_spec<"dlti.alloca_memory_space" = 5 
: ui64, "dlti.global_memory_space" = 1 : ui64>, llvm.target_triple = 
"amdgcn-amd-amdhsa", omp.is_gpu = true, omp.is_target_device = true} {
+  llvm.func @bar() {}
+  llvm.func @baz() {}
+
+  omp.declare_reduction @add_reduction_byref_box_5xf32 : !llvm.ptr alloc {
+%0 = llvm.mlir.constant(1 : i64) : i64
+%1 = llvm.alloca %0 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8, array<1 
x array<3 x i64>>)> : (i64) -> !llvm.ptr<5>
+%2 = llvm.addrspacecast %1 : !llvm.ptr<5> to !llvm.ptr
+omp.yield(%2 : !llvm.ptr)
+  } init {
+  ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
+omp.yield(%arg1 : !llvm.ptr)
+  } combiner {
+  ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
+llvm.call @bar() : () -> ()
+llvm.br ^bb3
+
+  ^bb3:  // pred: ^bb1
+llvm.call @baz() : () -> ()
+omp.yield(%arg0 : !llvm.ptr)
+  }
+  llvm.func @foo_() {
+%c1 = llvm.mlir.constant(1 : i64) : i64
+%10 = llvm.alloca %c1 x !llvm.array<5 x f32> {bindc_name = "x"} : (i64) -> 
!llvm.ptr<5>
+%11 = llvm.addrspacecast %10 : !llvm.ptr<5> to !llvm.ptr
+%74 = omp.map.info var_ptr(%11 : !llvm.ptr, !llvm.array<5 x f32>) 
map_clauses(tofrom) capture(ByRef) -> !llvm.ptr {name = "x"}
+omp.target map_entries(%74 -> %arg0 : !llvm.ptr) {
+  %c1_2 = llvm.mlir.constant(1 : i32) : i32
+  %c10 = llvm.mlir.constant(10 : i32) : i32
+  omp.teams reduction(byref @add_reduction_byref_box_5xf32 %arg0 -> %arg2 
: !llvm.ptr) {
+omp.parallel {
+  omp.distribute {
+omp.wsloop {
+  omp.loop_nest (%arg5) : i32 = (%c1_2) to (%c10) inclusive step 
(%c1_2) {
+omp.yield
+  }
+} {omp.composite}
+  } {omp.composite}
+  omp.terminator
+} {omp.composite}
+omp.terminator
+  }
+  omp.terminator
+}
+llvm.return
+  }
+}
+
+// CHECK:  call void @__kmpc_parallel_51({{.*}}, i32 1, i32 -1, i32 -1,
+// CHECK-SAME:   ptr @[[PAR_OUTLINED:.*]], ptr null, ptr %2, i64 1)
+
+// CHECK: define internal void @[[PAR_OUTLINED]]{{.*}} {
+// CHECK:   .omp.reduction.then:
+// CHECK: br label %omp.reduction.nonatomic.body
+
+// CHECK:   omp.reduction.nonatomic.body:
+// CHECK: call void @bar()
+// CHECK: br label %[[BODY_2ND_BB:.*]]
+
+// CHECK:   [[BODY_2ND_BB]]:
+// CHECK: call void @baz()
+// CHECK: br label %[[CONT_BB:.*]]
+
+// CHECK:   [[CONT_BB]]:
+// CHECK: br label %.omp.reduction.done
+// CHECK: }
+
+// CHECK: define internal void @"{{.*}}$reduction$reduction_func"(ptr noundef 
%0, ptr noundef %1) #0 {
+// CHECK: br label %omp.reduction.nonatomic.body
+
+// CHECK:   [[BODY_2ND_BB:.*]]:
+// CHECK: call void @baz()
+// CHECK: br label %omp.region.cont
+
+
+// CHECK: omp.reduction.nonatomic.body:
+// CHECK:   call void @bar()

[llvm-branch-commits] [libc++] Add build and CI support for pointer field protection (PR #152414)

2025-09-12 Thread Peter Collingbourne via llvm-branch-commits


@@ -411,6 +411,42 @@ bootstrapping-build)
 
 ccache -s
 ;;
+bootstrapping-build-pfp)

pcc wrote:

It's required because the PFP support in the compiler is experimental, and 
brand new so it won't exist in compilers that are already installed on the 
target system. Once PFP becomes a stable feature that is supported in released 
compilers, we may convert this to a non-bootstrapping buid.

https://github.com/llvm/llvm-project/pull/152414
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Add IR and codegen support for deactivation symbols. (PR #133536)

2025-09-12 Thread Peter Collingbourne via llvm-branch-commits

https://github.com/pcc updated https://github.com/llvm/llvm-project/pull/133536

>From f4c61b403c8a2c649741bae983196922143db44e Mon Sep 17 00:00:00 2001
From: Peter Collingbourne 
Date: Wed, 10 Sep 2025 18:02:38 -0700
Subject: [PATCH] Tweak LangRef

Created using spr 1.3.6-beta.1
---
 llvm/docs/LangRef.rst | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 10586f03cff8e..5380413aec892 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -3098,7 +3098,8 @@ Deactivation Symbol Operand Bundles
 A ``"deactivation-symbol"`` operand bundle is valid on the following
 instructions (AArch64 only):
 
-- Call to a normal function with ``notail`` attribute.
+- Call to a normal function with ``notail`` attribute and a first argument and
+  return value of type ``ptr``.
 - Call to ``llvm.ptrauth.sign`` or ``llvm.ptrauth.auth`` intrinsics.
 
 This operand bundle specifies that if the deactivation symbol is defined

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Add deactivation symbol operand to ConstantPtrAuth. (PR #133537)

2025-09-12 Thread Peter Collingbourne via llvm-branch-commits

https://github.com/pcc updated https://github.com/llvm/llvm-project/pull/133537

>From e728f3444624a5f47f0af84c21fb3a584f3e05b7 Mon Sep 17 00:00:00 2001
From: Peter Collingbourne 
Date: Fri, 1 Aug 2025 17:27:41 -0700
Subject: [PATCH 1/4] Add verifier check

Created using spr 1.3.6-beta.1
---
 llvm/lib/IR/Verifier.cpp   | 5 +
 llvm/test/Verifier/ptrauth-constant.ll | 6 ++
 2 files changed, 11 insertions(+)
 create mode 100644 llvm/test/Verifier/ptrauth-constant.ll

diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index 3ff9895e161c4..3478c2c450ae7 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -2627,6 +2627,11 @@ void Verifier::visitConstantPtrAuth(const 
ConstantPtrAuth *CPA) {
 
   Check(CPA->getDiscriminator()->getBitWidth() == 64,
 "signed ptrauth constant discriminator must be i64 constant integer");
+
+  Check(isa(CPA->getDeactivationSymbol()) ||
+CPA->getDeactivationSymbol()->isNullValue(),
+"signed ptrauth constant deactivation symbol must be a global value "
+"or null");
 }
 
 bool Verifier::verifyAttributeCount(AttributeList Attrs, unsigned Params) {
diff --git a/llvm/test/Verifier/ptrauth-constant.ll 
b/llvm/test/Verifier/ptrauth-constant.ll
new file mode 100644
index 0..fdd6352cf8469
--- /dev/null
+++ b/llvm/test/Verifier/ptrauth-constant.ll
@@ -0,0 +1,6 @@
+; RUN: not opt -passes=verify < %s 2>&1 | FileCheck %s
+
+@g = external global i8
+
+; CHECK: signed ptrauth constant deactivation symbol must be a global variable 
or null
+@ptr = global ptr ptrauth (ptr @g, i32 0, i64 65535, ptr null, ptr inttoptr 
(i64 16 to ptr))

>From 60e836e71bf9aabe9dade2bda1ca38107f76b599 Mon Sep 17 00:00:00 2001
From: Peter Collingbourne 
Date: Mon, 8 Sep 2025 17:34:59 -0700
Subject: [PATCH 2/4] Address review comment

Created using spr 1.3.6-beta.1
---
 llvm/lib/IR/Constants.cpp | 1 +
 llvm/test/Assembler/invalid-ptrauth-const6.ll | 6 ++
 2 files changed, 7 insertions(+)
 create mode 100644 llvm/test/Assembler/invalid-ptrauth-const6.ll

diff --git a/llvm/lib/IR/Constants.cpp b/llvm/lib/IR/Constants.cpp
index 5eacc7af1269b..53b292f90c03d 100644
--- a/llvm/lib/IR/Constants.cpp
+++ b/llvm/lib/IR/Constants.cpp
@@ -2082,6 +2082,7 @@ ConstantPtrAuth::ConstantPtrAuth(Constant *Ptr, 
ConstantInt *Key,
   assert(Key->getBitWidth() == 32);
   assert(Disc->getBitWidth() == 64);
   assert(AddrDisc->getType()->isPointerTy());
+  assert(DeactivationSymbol->getType()->isPointerTy());
   setOperand(0, Ptr);
   setOperand(1, Key);
   setOperand(2, Disc);
diff --git a/llvm/test/Assembler/invalid-ptrauth-const6.ll 
b/llvm/test/Assembler/invalid-ptrauth-const6.ll
new file mode 100644
index 0..6e8e1d386acc8
--- /dev/null
+++ b/llvm/test/Assembler/invalid-ptrauth-const6.ll
@@ -0,0 +1,6 @@
+; RUN: not llvm-as < %s 2>&1 | FileCheck %s
+
+@var = global i32 0
+
+; CHECK: error: constant ptrauth deactivation symbol must be a pointer
+@ptr = global ptr ptrauth (ptr @var, i32 0, i64 65535, ptr null, i64 0)

>From a780d181fa69236d5909759a24a1134b50313980 Mon Sep 17 00:00:00 2001
From: Peter Collingbourne 
Date: Tue, 9 Sep 2025 17:18:49 -0700
Subject: [PATCH 3/4] Address review comment

Created using spr 1.3.6-beta.1
---
 llvm/lib/Bitcode/Reader/BitcodeReader.cpp | 3 +++
 llvm/lib/IR/Verifier.cpp  | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp 
b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
index 045ed204620fb..04fe4c57af6ed 100644
--- a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
+++ b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
@@ -1613,6 +1613,9 @@ Expected 
BitcodeReader::materializeValue(unsigned StartValID,
   ConstOps.size() > 4 ? ConstOps[4]
   : ConstantPointerNull::get(cast(
 ConstOps[3]->getType()));
+  if (DeactivationSymbol->getType()->isPointerTy())
+return error(
+"ptrauth deactivation symbol operand must be a pointer");
 
   C = ConstantPtrAuth::get(ConstOps[0], Key, Disc, ConstOps[3],
DeactivationSymbol);
diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index 9e44dfb387615..a53ba17e26011 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -2632,6 +2632,9 @@ void Verifier::visitConstantPtrAuth(const ConstantPtrAuth 
*CPA) {
   Check(CPA->getDiscriminator()->getBitWidth() == 64,
 "signed ptrauth constant discriminator must be i64 constant integer");
 
+  Check(CPA->getDeactivationSymbol()->getType()->isPointerTy(),
+"signed ptrauth constant deactivation symbol must be a pointer");
+
   Check(isa(CPA->getDeactivationSymbol()) ||
 CPA->getDeactivationSymbol()->isNullValue(),
 "signed ptrauth constant deactivation symbol must be a global value "

>From 51c353bbde24f940e3dfd7488aec0682dbef260b Mon Se

[llvm-branch-commits] Add llvm.protected.field.ptr intrinsic and pre-ISel lowering. (PR #151647)

2025-09-12 Thread Peter Collingbourne via llvm-branch-commits

https://github.com/pcc updated https://github.com/llvm/llvm-project/pull/151647


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang][OpenMP] `do concurrent`: support `local` on device (PR #157638)

2025-09-12 Thread Kareem Ergawy via llvm-branch-commits

https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/157638

>From 723193bcd43fc0be3e3e18b95e35d2ac8226aa18 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Tue, 2 Sep 2025 05:54:00 -0500
Subject: [PATCH] [flang][OpenMP] `do concurrent`: support `local` on device

Extends support for mapping `do concurrent` on the device by adding
support for `local` specifiers. The changes in this PR map the local
variable to the `omp.target` op and uses the mapped value as the
`private` clause operand in the nested `omp.parallel` op.
---
 .../include/flang/Optimizer/Dialect/FIROps.td |  12 ++
 .../OpenMP/DoConcurrentConversion.cpp | 192 +++---
 .../Transforms/DoConcurrent/local_device.mlir |  49 +
 3 files changed, 175 insertions(+), 78 deletions(-)
 create mode 100644 flang/test/Transforms/DoConcurrent/local_device.mlir

diff --git a/flang/include/flang/Optimizer/Dialect/FIROps.td 
b/flang/include/flang/Optimizer/Dialect/FIROps.td
index bc971e8fd6600..fc6eedc6ed4c6 100644
--- a/flang/include/flang/Optimizer/Dialect/FIROps.td
+++ b/flang/include/flang/Optimizer/Dialect/FIROps.td
@@ -3894,6 +3894,18 @@ def fir_DoConcurrentLoopOp : fir_Op<"do_concurrent.loop",
   return getReduceVars().size();
 }
 
+unsigned getInductionVarsStart() {
+  return 0;
+}
+
+unsigned getLocalOperandsStart() {
+  return getNumInductionVars();
+}
+
+unsigned getReduceOperandsStart() {
+  return getLocalOperandsStart() + getNumLocalOperands();
+}
+
 mlir::Block::BlockArgListType getInductionVars() {
   return getBody()->getArguments().slice(0, getNumInductionVars());
 }
diff --git a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp 
b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
index 6c71924000842..d00a4fdd2cf2e 100644
--- a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
+++ b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
@@ -138,6 +138,9 @@ void collectLoopLiveIns(fir::DoConcurrentLoopOp loop,
 
 liveIns.push_back(operand->get());
   });
+
+  for (mlir::Value local : loop.getLocalVars())
+liveIns.push_back(local);
 }
 
 /// Collects values that are local to a loop: "loop-local values". A loop-local
@@ -298,8 +301,7 @@ class DoConcurrentConversion
   .getIsTargetDevice();
 
   mlir::omp::TargetOperands targetClauseOps;
-  genLoopNestClauseOps(doLoop.getLoc(), rewriter, loop, mapper,
-   loopNestClauseOps,
+  genLoopNestClauseOps(doLoop.getLoc(), rewriter, loop, loopNestClauseOps,
isTargetDevice ? nullptr : &targetClauseOps);
 
   LiveInShapeInfoMap liveInShapeInfoMap;
@@ -321,14 +323,13 @@ class DoConcurrentConversion
 }
 
 mlir::omp::ParallelOp parallelOp =
-genParallelOp(doLoop.getLoc(), rewriter, ivInfos, mapper);
+genParallelOp(rewriter, loop, ivInfos, mapper);
 
 // Only set as composite when part of `distribute parallel do`.
 parallelOp.setComposite(mapToDevice);
 
 if (!mapToDevice)
-  genLoopNestClauseOps(doLoop.getLoc(), rewriter, loop, mapper,
-   loopNestClauseOps);
+  genLoopNestClauseOps(doLoop.getLoc(), rewriter, loop, loopNestClauseOps);
 
 for (mlir::Value local : locals)
   looputils::localizeLoopLocalValue(local, parallelOp.getRegion(),
@@ -337,10 +338,38 @@ class DoConcurrentConversion
 if (mapToDevice)
   genDistributeOp(doLoop.getLoc(), rewriter).setComposite(/*val=*/true);
 
-mlir::omp::LoopNestOp ompLoopNest =
+auto [loopNestOp, wsLoopOp] =
 genWsLoopOp(rewriter, loop, mapper, loopNestClauseOps,
 /*isComposite=*/mapToDevice);
 
+// `local` region arguments are transferred/cloned from the `do concurrent`
+// loop to the loopnest op when the region is cloned above. Instead, these
+// region arguments should be on the workshare loop's region.
+if (mapToDevice) {
+  for (auto [parallelArg, loopNestArg] : llvm::zip_equal(
+   parallelOp.getRegion().getArguments(),
+   loopNestOp.getRegion().getArguments().slice(
+   loop.getLocalOperandsStart(), loop.getNumLocalOperands(
+rewriter.replaceAllUsesWith(loopNestArg, parallelArg);
+
+  for (auto [wsloopArg, loopNestArg] : llvm::zip_equal(
+   wsLoopOp.getRegion().getArguments(),
+   loopNestOp.getRegion().getArguments().slice(
+   loop.getReduceOperandsStart(), 
loop.getNumReduceOperands(
+rewriter.replaceAllUsesWith(loopNestArg, wsloopArg);
+} else {
+  for (auto [wsloopArg, loopNestArg] :
+   llvm::zip_equal(wsLoopOp.getRegion().getArguments(),
+   loopNestOp.getRegion().getArguments().drop_front(
+   loopNestClauseOps.loopLowerBounds.size(
+rewriter.replaceAllUsesWith(loopNestArg, wsloopArg);
+}
+
+for (unsigned i = 0;
+ i 

[llvm-branch-commits] [libcxx] [libc++] Add ABI flag to make __tree nodes more compact (PR #147681)

2025-09-12 Thread Louis Dionne via llvm-branch-commits

https://github.com/ldionne commented:

LGTM but let's A/B measure this to see whether there is a visible impact. I'm 
especially looking for a regression caused by more expensive pointer chasing 
since we have to "decode" the pointer now. If we don't see issues with this, I 
think I'd be OK with making this the new "de facto" ABI for v2 unconditionally.

Also, this obviously needs `pointer_int_pair` to land.

https://github.com/llvm/llvm-project/pull/147681
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AllocToken, Clang] Implement __builtin_alloc_token_infer() and llvm.alloc.token.id (PR #156842)

2025-09-12 Thread Oliver Hunt via llvm-branch-commits


@@ -1274,6 +1274,12 @@ def AllocaWithAlignUninitialized : Builtin {
   let Prototype = "void*(size_t, _Constant size_t)";
 }
 
+def AllocTokenInfer : Builtin {
+  let Spellings = ["__builtin_alloc_token_infer"];

ojhunt wrote:

I think `__builtin_infer_alloc_token` sounds better? I can't think of a way to 
easily infer from returns :-/

A developer can work with out parameters though - at least in the macro use 
case `infer(*out_param_expr)`

https://github.com/llvm/llvm-project/pull/156842
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [clang-tools-extra] [compiler-rt] [libcxx] [libcxxabi] [libunwind] [lldb] [llvm] [mlir] [openmp] release/21.x: [CMake][AIX] quote the string AIX `if` conditions (PR #1565

2025-09-12 Thread David Tenty via llvm-branch-commits

daltenty wrote:

> Uhm - this looks pretty big and seems like something that can easily break 
> certain build configurations since it doesn't seem to touch only AIX

Agreed that this looks big and scary, but it's a purely mechanical change, that 
is a no-op for most targets. I'll add a long form rational at the end of the 
comment about why I don't think the patch effects anyone but AIX to keep my 
answers brief.

>Is this in main without any issues?

Yes, these patches have been in main for several weeks at this point with no 
reported issues.

>  Does it really NEED to be merged for the release branch at this point?

It would help us out for the point releases. Without this patch, we're unable 
to build on AIX with CMake from our package manager (4.0). We can manually 
downgrade if we're unwilling

**Rationale about why the patch doesn't affect targets besides AIX**

We quote the string AIX and variable expansions which might expand to string 
AIX (i.e. `CMAKE_SYSTEM_NAME`), so that we do the intent string comparison. If 
not quoted the if will expand the string if it happens to match a variable name 
(which `AIX` does in CMake 4.0+).

This has an effect only if `CMAKE_SYSTEM_NAME` 
(https://cmake.org/cmake/help/latest/variable/CMAKE_SYSTEM_NAME.html) expands 
to something which is a CMake variable 
(https://cmake.org/cmake/help/latest/manual/cmake-variables.7.html#variables-that-describe-the-system)

Intersecting the two list gives me the following list of affect targets:
```
AIX
CYGWIN
MSYS
WASI
```

Of those targets, only CYGWIN appears in the lines affected by the patch, and 
it's already using a variable check (i.e. it checks `CYGWIN`) not a string 
comparison to `CMAKE_SYSTEM_NAME`, so it's unaffected.

https://github.com/llvm/llvm-project/pull/156505
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] Prepare libcxx and libcxxabi for pointer field protection. (PR #151651)

2025-09-12 Thread Peter Collingbourne via llvm-branch-commits


@@ -256,9 +256,12 @@ void unique_ptr_test() {
   ComparePrettyPrintToRegex(std::move(forty_two),
   R"(std::unique_ptr containing = {__ptr_ = 0x[a-f0-9]+})");
 
+#if !defined(__POINTER_FIELD_PROTECTION__)
+  // GDB doesn't know how to read PFP fields correctly yet.

pcc wrote:

The support for this feature in GCC is independent of support in GDB. We could 
imagine debug info extensions being developed in the future to make it possible 
for this to pass in GDB even without GCC support.

That being said, disabling the test is also fine with me.

https://github.com/llvm/llvm-project/pull/151651
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] Prepare libcxx and libcxxabi for pointer field protection. (PR #151651)

2025-09-12 Thread Nikolas Klauser via llvm-branch-commits


@@ -256,9 +256,12 @@ void unique_ptr_test() {
   ComparePrettyPrintToRegex(std::move(forty_two),
   R"(std::unique_ptr containing = {__ptr_ = 0x[a-f0-9]+})");
 
+#if !defined(__POINTER_FIELD_PROTECTION__)
+  // GDB doesn't know how to read PFP fields correctly yet.

philnik777 wrote:

Does GCC have pfp in general? If not, IMO we should just disable the pretty 
printer test with pfp enabled.

https://github.com/llvm/llvm-project/pull/151651
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [MC] Rewrite stdin.s to use python (PR #157232)

2025-09-12 Thread Aiden Grossman via llvm-branch-commits

https://github.com/boomanaiden154 updated 
https://github.com/llvm/llvm-project/pull/157232

>From d749f30964e57caa797b3df87ae88ffc3d4a2f54 Mon Sep 17 00:00:00 2001
From: Aiden Grossman 
Date: Sun, 7 Sep 2025 17:39:19 +
Subject: [PATCH 1/3] feedback

Created using spr 1.3.6
---
 llvm/test/MC/COFF/stdin.py | 17 +
 llvm/test/MC/COFF/stdin.s  |  1 -
 2 files changed, 17 insertions(+), 1 deletion(-)
 create mode 100644 llvm/test/MC/COFF/stdin.py
 delete mode 100644 llvm/test/MC/COFF/stdin.s

diff --git a/llvm/test/MC/COFF/stdin.py b/llvm/test/MC/COFF/stdin.py
new file mode 100644
index 0..8b7b6ae1fba13
--- /dev/null
+++ b/llvm/test/MC/COFF/stdin.py
@@ -0,0 +1,17 @@
+# RUN: echo "// comment" > %t.input
+# RUN: which llvm-mc | %python %s %t
+
+import subprocess
+import sys
+
+llvm_mc_binary = sys.stdin.readlines()[0].strip()
+temp_file = sys.argv[1]
+input_file = temp_file + ".input"
+
+with open(temp_file, "w") as mc_stdout:
+mc_stdout.seek(4)
+subprocess.run(
+[llvm_mc_binary, "-filetype=obj", "-triple", "i686-pc-win32", 
input_file],
+stdout=mc_stdout,
+check=True,
+)
diff --git a/llvm/test/MC/COFF/stdin.s b/llvm/test/MC/COFF/stdin.s
deleted file mode 100644
index 8ceae7fdef501..0
--- a/llvm/test/MC/COFF/stdin.s
+++ /dev/null
@@ -1 +0,0 @@
-// RUN: bash -c '(echo "test"; llvm-mc -filetype=obj -triple i686-pc-win32 %s 
) > %t'

>From 0bfe954d4cd5edf4312e924c278c59e57644d5f1 Mon Sep 17 00:00:00 2001
From: Aiden Grossman 
Date: Mon, 8 Sep 2025 17:28:59 +
Subject: [PATCH 2/3] feedback

Created using spr 1.3.6
---
 llvm/test/MC/COFF/stdin.py | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/llvm/test/MC/COFF/stdin.py b/llvm/test/MC/COFF/stdin.py
index 8b7b6ae1fba13..1d9b50c022523 100644
--- a/llvm/test/MC/COFF/stdin.py
+++ b/llvm/test/MC/COFF/stdin.py
@@ -1,14 +1,22 @@
 # RUN: echo "// comment" > %t.input
 # RUN: which llvm-mc | %python %s %t
 
+import argparse
 import subprocess
 import sys
 
+parser = argparse.ArgumentParser()
+parser.add_argument("temp_file")
+arguments = parser.parse_args()
+
 llvm_mc_binary = sys.stdin.readlines()[0].strip()
-temp_file = sys.argv[1]
+temp_file = arguments.temp_file
 input_file = temp_file + ".input"
 
 with open(temp_file, "w") as mc_stdout:
+## We need to test that starting on an input stream with a non-zero offset
+## does not trigger an assertion in WinCOFFObjectWriter.cpp, so we seek
+## past zero for STDOUT.
 mc_stdout.seek(4)
 subprocess.run(
 [llvm_mc_binary, "-filetype=obj", "-triple", "i686-pc-win32", 
input_file],

>From 2ae17e4f18a95c52b53ad5ad45a19c4bf29e5025 Mon Sep 17 00:00:00 2001
From: Aiden Grossman 
Date: Mon, 8 Sep 2025 17:43:39 +
Subject: [PATCH 3/3] feedback

Created using spr 1.3.6
---
 llvm/test/MC/COFF/stdin.py | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/llvm/test/MC/COFF/stdin.py b/llvm/test/MC/COFF/stdin.py
index 1d9b50c022523..0da1b4895142b 100644
--- a/llvm/test/MC/COFF/stdin.py
+++ b/llvm/test/MC/COFF/stdin.py
@@ -1,25 +1,30 @@
 # RUN: echo "// comment" > %t.input
-# RUN: which llvm-mc | %python %s %t
+# RUN: which llvm-mc | %python %s %t.input %t
 
 import argparse
 import subprocess
 import sys
 
 parser = argparse.ArgumentParser()
+parser.add_argument("input_file")
 parser.add_argument("temp_file")
 arguments = parser.parse_args()
 
 llvm_mc_binary = sys.stdin.readlines()[0].strip()
-temp_file = arguments.temp_file
-input_file = temp_file + ".input"
 
-with open(temp_file, "w") as mc_stdout:
+with open(arguments.temp_file, "w") as mc_stdout:
 ## We need to test that starting on an input stream with a non-zero offset
 ## does not trigger an assertion in WinCOFFObjectWriter.cpp, so we seek
 ## past zero for STDOUT.
 mc_stdout.seek(4)
 subprocess.run(
-[llvm_mc_binary, "-filetype=obj", "-triple", "i686-pc-win32", 
input_file],
+[
+llvm_mc_binary,
+"-filetype=obj",
+"-triple",
+"i686-pc-win32",
+arguments.input_file,
+],
 stdout=mc_stdout,
 check=True,
 )

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/21.x: [VPlan] Don't narrow op multiple times in narrowInterle… (PR #158013)

2025-09-12 Thread Florian Hahn via llvm-branch-commits

https://github.com/fhahn created 
https://github.com/llvm/llvm-project/pull/158013

…aveGroups.

Track which ops already have been narrowed, to avoid narrowing the same 
operation multiple times. Repeated narrowing will lead to incorrect results, 
because we could first narrow from an interleave group -> wide load, and then 
narrow the wide load > single-scalar load.

Fixes thttps://github.com/llvm/llvm-project/issues/156190.

>From 93505953fea754e6bbb1edb5fca75097132377b5 Mon Sep 17 00:00:00 2001
From: Florian Hahn 
Date: Wed, 10 Sep 2025 17:09:49 +0100
Subject: [PATCH] release/21.x: [VPlan] Don't narrow op multiple times in
 narrowInterleaveGroups.

Track which ops already have been narrowed, to avoid narrowing the same
operation multiple times. Repeated narrowing will lead to incorrect
results, because we could first narrow from an interleave group -> wide
load, and then narrow the wide load > single-scalar load.

Fixes thttps://github.com/llvm/llvm-project/issues/156190.
---
 .../Transforms/Vectorize/VPlanTransforms.cpp  |  8 +-
 ...nterleave-to-widen-memory-with-wide-ops.ll | 79 +++
 ...sform-narrow-interleave-to-widen-memory.ll | 73 +
 3 files changed, 158 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp 
b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 6a3b3e6e41955..f7c1c10185c68 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -3252,9 +3252,10 @@ void VPlanTransforms::narrowInterleaveGroups(VPlan 
&Plan, ElementCount VF,
 return;
 
   // Convert InterleaveGroup \p R to a single VPWidenLoadRecipe.
-  auto NarrowOp = [](VPValue *V) -> VPValue * {
+  SmallPtrSet NarrowedOps;
+  auto NarrowOp = [&NarrowedOps](VPValue *V) -> VPValue * {
 auto *R = V->getDefiningRecipe();
-if (!R)
+if (!R || NarrowedOps.contains(V))
   return V;
 if (auto *LoadGroup = dyn_cast(R)) {
   // Narrow interleave group to wide load, as transformed VPlan will only
@@ -3264,6 +3265,7 @@ void VPlanTransforms::narrowInterleaveGroups(VPlan &Plan, 
ElementCount VF,
   LoadGroup->getAddr(), LoadGroup->getMask(), /*Consecutive=*/true,
   /*Reverse=*/false, {}, LoadGroup->getDebugLoc());
   L->insertBefore(LoadGroup);
+  NarrowedOps.insert(L);
   return L;
 }
 
@@ -3271,6 +3273,7 @@ void VPlanTransforms::narrowInterleaveGroups(VPlan &Plan, 
ElementCount VF,
   assert(RepR->isSingleScalar() &&
  isa(RepR->getUnderlyingInstr()) &&
  "must be a single scalar load");
+  NarrowedOps.insert(RepR);
   return RepR;
 }
 auto *WideLoad = cast(R);
@@ -3281,6 +3284,7 @@ void VPlanTransforms::narrowInterleaveGroups(VPlan &Plan, 
ElementCount VF,
 WideLoad->operands(), /*IsUniform*/ true,
 /*Mask*/ nullptr, *WideLoad);
 N->insertBefore(WideLoad);
+NarrowedOps.insert(N);
 return N;
   };
 
diff --git 
a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-with-wide-ops.ll
 
b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-with-wide-ops.ll
index 813d61b52100f..aec6c0be6dde2 100644
--- 
a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-with-wide-ops.ll
+++ 
b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-with-wide-ops.ll
@@ -1203,3 +1203,82 @@ loop:
 exit:
   ret void
 }
+
+; Make sure multiple uses of a narrowed op are handled correctly,
+; https://github.com/llvm/llvm-project/issues/156190.
+define void @multiple_store_groups_storing_same_wide_bin_op(ptr noalias %A, 
ptr noalias %B, ptr noalias %C) {
+; VF2-LABEL: define void @multiple_store_groups_storing_same_wide_bin_op(
+; VF2-SAME: ptr noalias [[A:%.*]], ptr noalias [[B:%.*]], ptr noalias 
[[C:%.*]]) {
+; VF2-NEXT:  [[ENTRY:.*:]]
+; VF2-NEXT:br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; VF2:   [[VECTOR_PH]]:
+; VF2-NEXT:br label %[[VECTOR_BODY:.*]]
+; VF2:   [[VECTOR_BODY]]:
+; VF2-NEXT:[[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ 
[[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; VF2-NEXT:[[TMP0:%.*]] = getelementptr { double, double }, ptr [[A]], i64 
[[INDEX]]
+; VF2-NEXT:[[BROADCAST_SPLAT:%.*]] = load <2 x double>, ptr [[TMP0]], 
align 8
+; VF2-NEXT:[[TMP2:%.*]] = fadd contract <2 x double> [[BROADCAST_SPLAT]], 
splat (double 2.00e+01)
+; VF2-NEXT:[[TMP3:%.*]] = getelementptr { double, double }, ptr [[B]], i64 
[[INDEX]]
+; VF2-NEXT:store <2 x double> [[TMP2]], ptr [[TMP3]], align 8
+; VF2-NEXT:[[TMP4:%.*]] = getelementptr { double, double }, ptr [[C]], i64 
[[INDEX]]
+; VF2-NEXT:store <2 x double> [[TMP2]], ptr [[TMP4]], align 8
+; VF2-NEXT:[[INDEX_NEXT]] = add nuw i64 [[INDEX]], 1
+; VF2-NEXT:[[TMP5:%.*]] = icmp eq i64 [[IN

[llvm-branch-commits] Prepare libcxx and libcxxabi for pointer field protection. (PR #151651)

2025-09-12 Thread Peter Collingbourne via llvm-branch-commits


@@ -26,6 +26,12 @@
 #  include 
 #endif
 
+#if defined(__POINTER_FIELD_PROTECTION__)
+constexpr bool pfp_disabled = false;
+#else
+constexpr bool pfp_disabled = true;
+#endif

pcc wrote:

That's fine with me I suppose. The correct result for 
`__libcpp_is_trivially_relocatable` is implicitly tested by the other tests 
(which would crash if it was wrong).

https://github.com/llvm/llvm-project/pull/151651
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/21.x: [VPlan] Don't narrow op multiple times in narrowInterleaveGroups. (PR #158013)

2025-09-12 Thread Florian Hahn via llvm-branch-commits

https://github.com/fhahn edited https://github.com/llvm/llvm-project/pull/158013
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/21.x: [VPlan] Don't narrow op multiple times in narrowInterleaveGroups. (PR #158013)

2025-09-12 Thread Florian Hahn via llvm-branch-commits

https://github.com/fhahn milestoned 
https://github.com/llvm/llvm-project/pull/158013
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/21.x: [VPlan] Don't narrow op multiple times in narrowInterleaveGroups. (PR #158013)

2025-09-12 Thread via llvm-branch-commits

llvmbot wrote:



@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)


Changes

Track which ops already have been narrowed, to avoid narrowing the same 
operation multiple times. Repeated narrowing will lead to incorrect results, 
because we could first narrow from an interleave group -> wide load, and 
then narrow the wide load > single-scalar load.

Fixes thttps://github.com/llvm/llvm-project/issues/156190.

---
Full diff: https://github.com/llvm/llvm-project/pull/158013.diff


3 Files Affected:

- (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+6-2) 
- (modified) 
llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-with-wide-ops.ll
 (+79) 
- (modified) 
llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory.ll
 (+73) 


``diff
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp 
b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 6a3b3e6e41955..f7c1c10185c68 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -3252,9 +3252,10 @@ void VPlanTransforms::narrowInterleaveGroups(VPlan 
&Plan, ElementCount VF,
 return;
 
   // Convert InterleaveGroup \p R to a single VPWidenLoadRecipe.
-  auto NarrowOp = [](VPValue *V) -> VPValue * {
+  SmallPtrSet NarrowedOps;
+  auto NarrowOp = [&NarrowedOps](VPValue *V) -> VPValue * {
 auto *R = V->getDefiningRecipe();
-if (!R)
+if (!R || NarrowedOps.contains(V))
   return V;
 if (auto *LoadGroup = dyn_cast(R)) {
   // Narrow interleave group to wide load, as transformed VPlan will only
@@ -3264,6 +3265,7 @@ void VPlanTransforms::narrowInterleaveGroups(VPlan &Plan, 
ElementCount VF,
   LoadGroup->getAddr(), LoadGroup->getMask(), /*Consecutive=*/true,
   /*Reverse=*/false, {}, LoadGroup->getDebugLoc());
   L->insertBefore(LoadGroup);
+  NarrowedOps.insert(L);
   return L;
 }
 
@@ -3271,6 +3273,7 @@ void VPlanTransforms::narrowInterleaveGroups(VPlan &Plan, 
ElementCount VF,
   assert(RepR->isSingleScalar() &&
  isa(RepR->getUnderlyingInstr()) &&
  "must be a single scalar load");
+  NarrowedOps.insert(RepR);
   return RepR;
 }
 auto *WideLoad = cast(R);
@@ -3281,6 +3284,7 @@ void VPlanTransforms::narrowInterleaveGroups(VPlan &Plan, 
ElementCount VF,
 WideLoad->operands(), /*IsUniform*/ true,
 /*Mask*/ nullptr, *WideLoad);
 N->insertBefore(WideLoad);
+NarrowedOps.insert(N);
 return N;
   };
 
diff --git 
a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-with-wide-ops.ll
 
b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-with-wide-ops.ll
index 813d61b52100f..aec6c0be6dde2 100644
--- 
a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-with-wide-ops.ll
+++ 
b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-with-wide-ops.ll
@@ -1203,3 +1203,82 @@ loop:
 exit:
   ret void
 }
+
+; Make sure multiple uses of a narrowed op are handled correctly,
+; https://github.com/llvm/llvm-project/issues/156190.
+define void @multiple_store_groups_storing_same_wide_bin_op(ptr noalias %A, 
ptr noalias %B, ptr noalias %C) {
+; VF2-LABEL: define void @multiple_store_groups_storing_same_wide_bin_op(
+; VF2-SAME: ptr noalias [[A:%.*]], ptr noalias [[B:%.*]], ptr noalias 
[[C:%.*]]) {
+; VF2-NEXT:  [[ENTRY:.*:]]
+; VF2-NEXT:br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; VF2:   [[VECTOR_PH]]:
+; VF2-NEXT:br label %[[VECTOR_BODY:.*]]
+; VF2:   [[VECTOR_BODY]]:
+; VF2-NEXT:[[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ 
[[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; VF2-NEXT:[[TMP0:%.*]] = getelementptr { double, double }, ptr [[A]], i64 
[[INDEX]]
+; VF2-NEXT:[[BROADCAST_SPLAT:%.*]] = load <2 x double>, ptr [[TMP0]], 
align 8
+; VF2-NEXT:[[TMP2:%.*]] = fadd contract <2 x double> [[BROADCAST_SPLAT]], 
splat (double 2.00e+01)
+; VF2-NEXT:[[TMP3:%.*]] = getelementptr { double, double }, ptr [[B]], i64 
[[INDEX]]
+; VF2-NEXT:store <2 x double> [[TMP2]], ptr [[TMP3]], align 8
+; VF2-NEXT:[[TMP4:%.*]] = getelementptr { double, double }, ptr [[C]], i64 
[[INDEX]]
+; VF2-NEXT:store <2 x double> [[TMP2]], ptr [[TMP4]], align 8
+; VF2-NEXT:[[INDEX_NEXT]] = add nuw i64 [[INDEX]], 1
+; VF2-NEXT:[[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
+; VF2-NEXT:br i1 [[TMP5]], label %[[MIDDLE_BLOCK:.*]], label 
%[[VECTOR_BODY]], !llvm.loop [[LOOP30:![0-9]+]]
+; VF2:   [[MIDDLE_BLOCK]]:
+; VF2-NEXT:br i1 true, [[EXIT:label %.*]], label %[[SCALAR_PH]]
+; VF2:   [[SCALAR_PH]]:
+;
+; VF4-LABEL: define void @multiple_store_groups_storing_same_wide_bin_op(
+; VF4-SAME: ptr noalias

[llvm-branch-commits] [llvm] release/21.x: [VPlan] Don't narrow op multiple times in narrowInterleaveGroups. (PR #158013)

2025-09-12 Thread Florian Hahn via llvm-branch-commits

https://github.com/fhahn edited https://github.com/llvm/llvm-project/pull/158013
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] Prepare libcxx and libcxxabi for pointer field protection. (PR #151651)

2025-09-12 Thread Peter Collingbourne via llvm-branch-commits


@@ -484,8 +484,21 @@ typedef __char32_t char32_t;
 #define _LIBCPP_EXCEPTIONS_SIG e
 #  endif
 
+#  if !_LIBCPP_HAS_EXCEPTIONS
+#define _LIBCPP_EXCEPTIONS_SIG n
+#  else
+#define _LIBCPP_EXCEPTIONS_SIG e
+#  endif
+
+#  if defined(__POINTER_FIELD_PROTECTION__)
+#define _LIBCPP_PFP_SIG p
+#  else
+#define _LIBCPP_PFP_SIG
+#  endif

pcc wrote:

Yes, the in-memory pointer format changes so it's effectively a layout (ABI) 
change. Therefore we need an ABI tag change to detect/prevent linking against 
mismatching ABIs. This was requested by @mordante in #133538.

https://github.com/llvm/llvm-project/pull/151651
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] 1e192c0 - Revert "[MLIR] Remove CopyOpInterface (#157711)"

2025-09-12 Thread via llvm-branch-commits

Author: Mehdi Amini
Date: 2025-09-11T10:25:36+01:00
New Revision: 1e192c006bf978fad12dbc4bba8c6213b6b9c907

URL: 
https://github.com/llvm/llvm-project/commit/1e192c006bf978fad12dbc4bba8c6213b6b9c907
DIFF: 
https://github.com/llvm/llvm-project/commit/1e192c006bf978fad12dbc4bba8c6213b6b9c907.diff

LOG: Revert "[MLIR] Remove CopyOpInterface (#157711)"

This reverts commit 63647074ba97dc606c7ba48c3800ec08ca501d92.

Added: 
mlir/include/mlir/Interfaces/CopyOpInterface.h
mlir/include/mlir/Interfaces/CopyOpInterface.td
mlir/lib/Interfaces/CopyOpInterface.cpp

Modified: 
mlir/include/mlir/Dialect/Bufferization/IR/Bufferization.h
mlir/include/mlir/Dialect/Bufferization/IR/BufferizationOps.td
mlir/include/mlir/Dialect/Linalg/IR/Linalg.h
mlir/include/mlir/Dialect/MemRef/IR/MemRef.h
mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td
mlir/include/mlir/Interfaces/CMakeLists.txt
mlir/lib/Interfaces/CMakeLists.txt
mlir/test/lib/Dialect/Test/TestDialect.h
mlir/test/lib/Dialect/Test/TestOps.h
mlir/test/lib/Dialect/Test/TestOps.td

Removed: 




diff  --git a/mlir/include/mlir/Dialect/Bufferization/IR/Bufferization.h 
b/mlir/include/mlir/Dialect/Bufferization/IR/Bufferization.h
index e735651d5366d..1ef5370802953 100644
--- a/mlir/include/mlir/Dialect/Bufferization/IR/Bufferization.h
+++ b/mlir/include/mlir/Dialect/Bufferization/IR/Bufferization.h
@@ -12,6 +12,7 @@
 #include "mlir/Bytecode/BytecodeOpInterface.h"
 #include "mlir/Dialect/Bufferization/IR/AllocationOpInterface.h"
 #include "mlir/Dialect/Bufferization/IR/BufferizableOpInterface.h"
+#include "mlir/Interfaces/CopyOpInterface.h"
 #include "mlir/Interfaces/DestinationStyleOpInterface.h"
 #include "mlir/Interfaces/InferTypeOpInterface.h"
 #include "mlir/Interfaces/SubsetOpInterface.h"

diff  --git a/mlir/include/mlir/Dialect/Bufferization/IR/BufferizationOps.td 
b/mlir/include/mlir/Dialect/Bufferization/IR/BufferizationOps.td
index 6724d4c483101..271b42025e0af 100644
--- a/mlir/include/mlir/Dialect/Bufferization/IR/BufferizationOps.td
+++ b/mlir/include/mlir/Dialect/Bufferization/IR/BufferizationOps.td
@@ -18,6 +18,7 @@ include "mlir/Interfaces/DestinationStyleOpInterface.td"
 include "mlir/Interfaces/InferTypeOpInterface.td"
 include "mlir/Interfaces/SideEffectInterfaces.td"
 include "mlir/Interfaces/SubsetOpInterface.td"
+include "mlir/Interfaces/CopyOpInterface.td"
 
 class Bufferization_Op traits = []>
 : Op;
@@ -170,6 +171,7 @@ def Bufferization_AllocTensorOp : 
Bufferization_Op<"alloc_tensor",
 
//===--===//
 
 def Bufferization_CloneOp : Bufferization_Op<"clone", [
+CopyOpInterface,
 MemoryEffectsOpInterface,
 DeclareOpInterfaceMethods
   ]> {

diff  --git a/mlir/include/mlir/Dialect/Linalg/IR/Linalg.h 
b/mlir/include/mlir/Dialect/Linalg/IR/Linalg.h
index 9de6d8fd50983..eb4e3810f0d07 100644
--- a/mlir/include/mlir/Dialect/Linalg/IR/Linalg.h
+++ b/mlir/include/mlir/Dialect/Linalg/IR/Linalg.h
@@ -22,6 +22,7 @@
 #include "mlir/IR/ImplicitLocOpBuilder.h"
 #include "mlir/IR/TypeUtilities.h"
 #include "mlir/Interfaces/ControlFlowInterfaces.h"
+#include "mlir/Interfaces/CopyOpInterface.h"
 #include "mlir/Interfaces/DestinationStyleOpInterface.h"
 #include "mlir/Interfaces/InferTypeOpInterface.h"
 #include "mlir/Interfaces/SideEffectInterfaces.h"

diff  --git a/mlir/include/mlir/Dialect/MemRef/IR/MemRef.h 
b/mlir/include/mlir/Dialect/MemRef/IR/MemRef.h
index bdec699eb4ce4..ac383ab46e7a5 100644
--- a/mlir/include/mlir/Dialect/MemRef/IR/MemRef.h
+++ b/mlir/include/mlir/Dialect/MemRef/IR/MemRef.h
@@ -16,6 +16,7 @@
 #include "mlir/Interfaces/CallInterfaces.h"
 #include "mlir/Interfaces/CastInterfaces.h"
 #include "mlir/Interfaces/ControlFlowInterfaces.h"
+#include "mlir/Interfaces/CopyOpInterface.h"
 #include "mlir/Interfaces/InferIntRangeInterface.h"
 #include "mlir/Interfaces/InferTypeOpInterface.h"
 #include "mlir/Interfaces/MemorySlotInterfaces.h"

diff  --git a/mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td 
b/mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td
index 513a9a18198a3..d6b7a97179b71 100644
--- a/mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td
+++ b/mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td
@@ -13,6 +13,7 @@ include "mlir/Dialect/Arith/IR/ArithBase.td"
 include "mlir/Dialect/MemRef/IR/MemRefBase.td"
 include "mlir/Interfaces/CastInterfaces.td"
 include "mlir/Interfaces/ControlFlowInterfaces.td"
+include "mlir/Interfaces/CopyOpInterface.td"
 include "mlir/Interfaces/InferIntRangeInterface.td"
 include "mlir/Interfaces/InferTypeOpInterface.td"
 include "mlir/Interfaces/MemorySlotInterfaces.td"
@@ -529,7 +530,7 @@ def MemRef_CastOp : MemRef_Op<"cast", [
 // CopyOp
 
//===--===//
 
-def CopyOp : MemRef_Op<"copy", [SameOperandsElementType,
+def CopyOp : MemRef_Op<"copy", [

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in various special cases (PR #145330)

2025-09-12 Thread Fabian Ritter via llvm-branch-commits

https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/145330

>From 41b0c715809685ab360559cf47af2fa3ddb8f036 Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Tue, 17 Jun 2025 04:03:53 -0400
Subject: [PATCH 1/2] [AMDGPU][SDAG] Handle ISD::PTRADD in various special
 cases

There are more places in SIISelLowering.cpp and AMDGPUISelDAGToDAG.cpp
that check for ISD::ADD in a pointer context, but as far as I can tell
those are only relevant for 32-bit pointer arithmetic (like frame
indices/scratch addresses and LDS), for which we don't enable PTRADD
generation yet.

For SWDEV-516125.
---
 .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp |   2 +-
 .../CodeGen/SelectionDAG/TargetLowering.cpp   |  21 +-
 llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp |   6 +-
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |   7 +-
 llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll |  67 ++
 .../AMDGPU/ptradd-sdag-optimizations.ll   | 196 ++
 6 files changed, 105 insertions(+), 194 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index bcf25958d0982..4ce58c0027aa6 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -8554,7 +8554,7 @@ static bool isMemSrcFromConstant(SDValue Src, 
ConstantDataArraySlice &Slice) {
   GlobalAddressSDNode *G = nullptr;
   if (Src.getOpcode() == ISD::GlobalAddress)
 G = cast(Src);
-  else if (Src.getOpcode() == ISD::ADD &&
+  else if (Src->isAnyAdd() &&
Src.getOperand(0).getOpcode() == ISD::GlobalAddress &&
Src.getOperand(1).getOpcode() == ISD::Constant) {
 G = cast(Src.getOperand(0));
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp 
b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index fd6d20e146bb2..e4d45f14a0c44 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -638,8 +638,14 @@ bool TargetLowering::ShrinkDemandedOp(SDValue Op, unsigned 
BitWidth,
   // operands on the new node are also disjoint.
   SDNodeFlags Flags(Op->getFlags().hasDisjoint() ? SDNodeFlags::Disjoint
  : SDNodeFlags::None);
+  unsigned Opcode = Op.getOpcode();
+  if (Opcode == ISD::PTRADD) {
+// It isn't a ptradd anymore if it doesn't operate on the entire
+// pointer.
+Opcode = ISD::ADD;
+  }
   SDValue X = DAG.getNode(
-  Op.getOpcode(), dl, SmallVT,
+  Opcode, dl, SmallVT,
   DAG.getNode(ISD::TRUNCATE, dl, SmallVT, Op.getOperand(0)),
   DAG.getNode(ISD::TRUNCATE, dl, SmallVT, Op.getOperand(1)), Flags);
   assert(DemandedSize <= SmallVTBits && "Narrowed below demanded bits?");
@@ -2860,6 +2866,11 @@ bool TargetLowering::SimplifyDemandedBits(
   return TLO.CombineTo(Op, And1);
 }
 [[fallthrough]];
+  case ISD::PTRADD:
+if (Op.getOperand(0).getValueType() != Op.getOperand(1).getValueType())
+  break;
+// PTRADD behaves like ADD if pointers are represented as integers.
+[[fallthrough]];
   case ISD::ADD:
   case ISD::SUB: {
 // Add, Sub, and Mul don't demand any bits in positions beyond that
@@ -2969,10 +2980,10 @@ bool TargetLowering::SimplifyDemandedBits(
 
 if (Op.getOpcode() == ISD::MUL) {
   Known = KnownBits::mul(KnownOp0, KnownOp1);
-} else { // Op.getOpcode() is either ISD::ADD or ISD::SUB.
+} else { // Op.getOpcode() is either ISD::ADD, ISD::PTRADD, or ISD::SUB.
   Known = KnownBits::computeForAddSub(
-  Op.getOpcode() == ISD::ADD, Flags.hasNoSignedWrap(),
-  Flags.hasNoUnsignedWrap(), KnownOp0, KnownOp1);
+  Op->isAnyAdd(), Flags.hasNoSignedWrap(), Flags.hasNoUnsignedWrap(),
+  KnownOp0, KnownOp1);
 }
 break;
   }
@@ -5675,7 +5686,7 @@ bool TargetLowering::isGAPlusOffset(SDNode *WN, const 
GlobalValue *&GA,
 return true;
   }
 
-  if (N->getOpcode() == ISD::ADD) {
+  if (N->isAnyAdd()) {
 SDValue N1 = N->getOperand(0);
 SDValue N2 = N->getOperand(1);
 if (isGAPlusOffset(N1.getNode(), GA, Offset)) {
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
index 3785d0f7f2688..a0c2e60efcd9a 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
@@ -1531,7 +1531,7 @@ bool AMDGPUDAGToDAGISel::SelectMUBUF(SDValue Addr, 
SDValue &Ptr, SDValue &VAddr,
   C1 = nullptr;
   }
 
-  if (N0.getOpcode() == ISD::ADD) {
+  if (N0->isAnyAdd()) {
 // (add N2, N3) -> addr64, or
 // (add (add N2, N3), C1) -> addr64
 SDValue N2 = N0.getOperand(0);
@@ -1993,7 +1993,7 @@ bool AMDGPUDAGToDAGISel::SelectGlobalSAddr(SDNode *N, 
SDValue Addr,
   }
 
   // Match the variable offset.
-  if (Addr.getOpcode() == ISD::ADD) {
+  if (Addr->isAnyAdd()) {
 LHS = Addr.getOperand(0);
 
 if (!LHS

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in various special cases (PR #145330)

2025-09-12 Thread Fabian Ritter via llvm-branch-commits

https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/145330

>From 41b0c715809685ab360559cf47af2fa3ddb8f036 Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Tue, 17 Jun 2025 04:03:53 -0400
Subject: [PATCH 1/2] [AMDGPU][SDAG] Handle ISD::PTRADD in various special
 cases

There are more places in SIISelLowering.cpp and AMDGPUISelDAGToDAG.cpp
that check for ISD::ADD in a pointer context, but as far as I can tell
those are only relevant for 32-bit pointer arithmetic (like frame
indices/scratch addresses and LDS), for which we don't enable PTRADD
generation yet.

For SWDEV-516125.
---
 .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp |   2 +-
 .../CodeGen/SelectionDAG/TargetLowering.cpp   |  21 +-
 llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp |   6 +-
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |   7 +-
 llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll |  67 ++
 .../AMDGPU/ptradd-sdag-optimizations.ll   | 196 ++
 6 files changed, 105 insertions(+), 194 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index bcf25958d0982..4ce58c0027aa6 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -8554,7 +8554,7 @@ static bool isMemSrcFromConstant(SDValue Src, 
ConstantDataArraySlice &Slice) {
   GlobalAddressSDNode *G = nullptr;
   if (Src.getOpcode() == ISD::GlobalAddress)
 G = cast(Src);
-  else if (Src.getOpcode() == ISD::ADD &&
+  else if (Src->isAnyAdd() &&
Src.getOperand(0).getOpcode() == ISD::GlobalAddress &&
Src.getOperand(1).getOpcode() == ISD::Constant) {
 G = cast(Src.getOperand(0));
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp 
b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index fd6d20e146bb2..e4d45f14a0c44 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -638,8 +638,14 @@ bool TargetLowering::ShrinkDemandedOp(SDValue Op, unsigned 
BitWidth,
   // operands on the new node are also disjoint.
   SDNodeFlags Flags(Op->getFlags().hasDisjoint() ? SDNodeFlags::Disjoint
  : SDNodeFlags::None);
+  unsigned Opcode = Op.getOpcode();
+  if (Opcode == ISD::PTRADD) {
+// It isn't a ptradd anymore if it doesn't operate on the entire
+// pointer.
+Opcode = ISD::ADD;
+  }
   SDValue X = DAG.getNode(
-  Op.getOpcode(), dl, SmallVT,
+  Opcode, dl, SmallVT,
   DAG.getNode(ISD::TRUNCATE, dl, SmallVT, Op.getOperand(0)),
   DAG.getNode(ISD::TRUNCATE, dl, SmallVT, Op.getOperand(1)), Flags);
   assert(DemandedSize <= SmallVTBits && "Narrowed below demanded bits?");
@@ -2860,6 +2866,11 @@ bool TargetLowering::SimplifyDemandedBits(
   return TLO.CombineTo(Op, And1);
 }
 [[fallthrough]];
+  case ISD::PTRADD:
+if (Op.getOperand(0).getValueType() != Op.getOperand(1).getValueType())
+  break;
+// PTRADD behaves like ADD if pointers are represented as integers.
+[[fallthrough]];
   case ISD::ADD:
   case ISD::SUB: {
 // Add, Sub, and Mul don't demand any bits in positions beyond that
@@ -2969,10 +2980,10 @@ bool TargetLowering::SimplifyDemandedBits(
 
 if (Op.getOpcode() == ISD::MUL) {
   Known = KnownBits::mul(KnownOp0, KnownOp1);
-} else { // Op.getOpcode() is either ISD::ADD or ISD::SUB.
+} else { // Op.getOpcode() is either ISD::ADD, ISD::PTRADD, or ISD::SUB.
   Known = KnownBits::computeForAddSub(
-  Op.getOpcode() == ISD::ADD, Flags.hasNoSignedWrap(),
-  Flags.hasNoUnsignedWrap(), KnownOp0, KnownOp1);
+  Op->isAnyAdd(), Flags.hasNoSignedWrap(), Flags.hasNoUnsignedWrap(),
+  KnownOp0, KnownOp1);
 }
 break;
   }
@@ -5675,7 +5686,7 @@ bool TargetLowering::isGAPlusOffset(SDNode *WN, const 
GlobalValue *&GA,
 return true;
   }
 
-  if (N->getOpcode() == ISD::ADD) {
+  if (N->isAnyAdd()) {
 SDValue N1 = N->getOperand(0);
 SDValue N2 = N->getOperand(1);
 if (isGAPlusOffset(N1.getNode(), GA, Offset)) {
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
index 3785d0f7f2688..a0c2e60efcd9a 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
@@ -1531,7 +1531,7 @@ bool AMDGPUDAGToDAGISel::SelectMUBUF(SDValue Addr, 
SDValue &Ptr, SDValue &VAddr,
   C1 = nullptr;
   }
 
-  if (N0.getOpcode() == ISD::ADD) {
+  if (N0->isAnyAdd()) {
 // (add N2, N3) -> addr64, or
 // (add (add N2, N3), C1) -> addr64
 SDValue N2 = N0.getOperand(0);
@@ -1993,7 +1993,7 @@ bool AMDGPUDAGToDAGISel::SelectGlobalSAddr(SDNode *N, 
SDValue Addr,
   }
 
   // Match the variable offset.
-  if (Addr.getOpcode() == ISD::ADD) {
+  if (Addr->isAnyAdd()) {
 LHS = Addr.getOperand(0);
 
 if (!LHS

[llvm-branch-commits] [llvm] [SDAG][AMDGPU] Allow opting in to OOB-generating PTRADD transforms (PR #146074)

2025-09-12 Thread Fabian Ritter via llvm-branch-commits

https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/146074

>From 62623004e49ca66a426455e4b3ac4028f10f68fd Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Thu, 26 Jun 2025 06:10:35 -0400
Subject: [PATCH 1/2] [SDAG][AMDGPU] Allow opting in to OOB-generating PTRADD
 transforms

This PR adds a TargetLowering hook, canTransformPtrArithOutOfBounds,
that targets can use to allow transformations to introduce out-of-bounds
pointer arithmetic. It also moves two such transformations from the
AMDGPU-specific DAG combines to the generic DAGCombiner.

This is motivated by target features like AArch64's checked pointer
arithmetic, CPA, which does not tolerate the introduction of
out-of-bounds pointer arithmetic.
---
 llvm/include/llvm/CodeGen/TargetLowering.h|   7 +
 llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | 125 +++---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |  59 ++---
 llvm/lib/Target/AMDGPU/SIISelLowering.h   |   3 +
 4 files changed, 94 insertions(+), 100 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h 
b/llvm/include/llvm/CodeGen/TargetLowering.h
index 2ba8b29e775e0..d3aa168aaa861 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -3518,6 +3518,13 @@ class LLVM_ABI TargetLoweringBase {
 return false;
   }
 
+  /// True if the target allows transformations of in-bounds pointer
+  /// arithmetic that cause out-of-bounds intermediate results.
+  virtual bool canTransformPtrArithOutOfBounds(const Function &F,
+   EVT PtrVT) const {
+return false;
+  }
+
   /// Does this target support complex deinterleaving
   virtual bool isComplexDeinterleavingSupported() const { return false; }
 
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp 
b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index d130efe96b56b..9ee74cf5fbbdd 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -2696,59 +2696,82 @@ SDValue DAGCombiner::visitPTRADD(SDNode *N) {
   if (PtrVT == IntVT && isNullConstant(N0))
 return N1;
 
-  if (N0.getOpcode() != ISD::PTRADD ||
-  reassociationCanBreakAddressingModePattern(ISD::PTRADD, DL, N, N0, N1))
-return SDValue();
-
-  SDValue X = N0.getOperand(0);
-  SDValue Y = N0.getOperand(1);
-  SDValue Z = N1;
-  bool N0OneUse = N0.hasOneUse();
-  bool YIsConstant = DAG.isConstantIntBuildVectorOrConstantInt(Y);
-  bool ZIsConstant = DAG.isConstantIntBuildVectorOrConstantInt(Z);
-
-  // (ptradd (ptradd x, y), z) -> (ptradd x, (add y, z)) if:
-  //   * y is a constant and (ptradd x, y) has one use; or
-  //   * y and z are both constants.
-  if ((YIsConstant && N0OneUse) || (YIsConstant && ZIsConstant)) {
-// If both additions in the original were NUW, the new ones are as well.
-SDNodeFlags Flags =
-(N->getFlags() & N0->getFlags()) & SDNodeFlags::NoUnsignedWrap;
-SDValue Add = DAG.getNode(ISD::ADD, DL, IntVT, {Y, Z}, Flags);
-AddToWorklist(Add.getNode());
-return DAG.getMemBasePlusOffset(X, Add, DL, Flags);
+  if (N0.getOpcode() == ISD::PTRADD &&
+  !reassociationCanBreakAddressingModePattern(ISD::PTRADD, DL, N, N0, N1)) 
{
+SDValue X = N0.getOperand(0);
+SDValue Y = N0.getOperand(1);
+SDValue Z = N1;
+bool N0OneUse = N0.hasOneUse();
+bool YIsConstant = DAG.isConstantIntBuildVectorOrConstantInt(Y);
+bool ZIsConstant = DAG.isConstantIntBuildVectorOrConstantInt(Z);
+
+// (ptradd (ptradd x, y), z) -> (ptradd x, (add y, z)) if:
+//   * y is a constant and (ptradd x, y) has one use; or
+//   * y and z are both constants.
+if ((YIsConstant && N0OneUse) || (YIsConstant && ZIsConstant)) {
+  // If both additions in the original were NUW, the new ones are as well.
+  SDNodeFlags Flags =
+  (N->getFlags() & N0->getFlags()) & SDNodeFlags::NoUnsignedWrap;
+  SDValue Add = DAG.getNode(ISD::ADD, DL, IntVT, {Y, Z}, Flags);
+  AddToWorklist(Add.getNode());
+  return DAG.getMemBasePlusOffset(X, Add, DL, Flags);
+}
+  }
+
+  // The following combines can turn in-bounds pointer arithmetic out of 
bounds.
+  // That is problematic for settings like AArch64's CPA, which checks that
+  // intermediate results of pointer arithmetic remain in bounds. The target
+  // therefore needs to opt-in to enable them.
+  if (!TLI.canTransformPtrArithOutOfBounds(
+  DAG.getMachineFunction().getFunction(), PtrVT))
+return SDValue();
+
+  if (N0.getOpcode() == ISD::PTRADD && N1.getOpcode() == ISD::Constant) {
+// Fold (ptradd (ptradd GA, v), c) -> (ptradd (ptradd GA, c) v) with
+// global address GA and constant c, such that c can be folded into GA.
+SDValue GAValue = N0.getOperand(0);
+if (const GlobalAddressSDNode *GA =
+dyn_cast(GAValue)) {
+  const TargetLowering &TLI = DAG.getTargetLoweringInfo();
+  if (!LegalOperations && TLI.

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Test ISD::PTRADD handling in various special cases (PR #145329)

2025-09-12 Thread Fabian Ritter via llvm-branch-commits

https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/145329

>From 345456442d0d9e5a8babd9b72b8343d6608399d5 Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Tue, 17 Jun 2025 03:51:19 -0400
Subject: [PATCH] [AMDGPU][SDAG] Test ISD::PTRADD handling in various special
 cases

Pre-committing tests to show improvements in a follow-up PR.
---
 llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll |  63 ++
 .../AMDGPU/ptradd-sdag-optimizations.ll   | 206 ++
 2 files changed, 269 insertions(+)
 create mode 100644 llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll

diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll 
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll
new file mode 100644
index 0..fab56383ffa8a
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll
@@ -0,0 +1,63 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=tahiti -amdgpu-use-sdag-ptradd=1 < 
%s | FileCheck --check-prefixes=GFX6,GFX6_PTRADD %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=tahiti -amdgpu-use-sdag-ptradd=0 < 
%s | FileCheck --check-prefixes=GFX6,GFX6_LEGACY %s
+
+; Test PTRADD handling in AMDGPUDAGToDAGISel::SelectMUBUF.
+
+define amdgpu_kernel void @v_add_i32(ptr addrspace(1) %out, ptr addrspace(1) 
%in) {
+; GFX6_PTRADD-LABEL: v_add_i32:
+; GFX6_PTRADD:   ; %bb.0:
+; GFX6_PTRADD-NEXT:s_load_dwordx4 s[0:3], s[8:9], 0x0
+; GFX6_PTRADD-NEXT:v_lshlrev_b32_e32 v0, 2, v0
+; GFX6_PTRADD-NEXT:s_mov_b32 s7, 0x100f000
+; GFX6_PTRADD-NEXT:s_mov_b32 s10, 0
+; GFX6_PTRADD-NEXT:s_mov_b32 s11, s7
+; GFX6_PTRADD-NEXT:s_waitcnt lgkmcnt(0)
+; GFX6_PTRADD-NEXT:v_mov_b32_e32 v1, s3
+; GFX6_PTRADD-NEXT:v_add_i32_e32 v0, vcc, s2, v0
+; GFX6_PTRADD-NEXT:v_addc_u32_e32 v1, vcc, 0, v1, vcc
+; GFX6_PTRADD-NEXT:s_mov_b32 s8, s10
+; GFX6_PTRADD-NEXT:s_mov_b32 s9, s10
+; GFX6_PTRADD-NEXT:buffer_load_dword v2, v[0:1], s[8:11], 0 addr64 glc
+; GFX6_PTRADD-NEXT:s_waitcnt vmcnt(0)
+; GFX6_PTRADD-NEXT:buffer_load_dword v0, v[0:1], s[8:11], 0 addr64 
offset:4 glc
+; GFX6_PTRADD-NEXT:s_waitcnt vmcnt(0)
+; GFX6_PTRADD-NEXT:s_mov_b32 s6, -1
+; GFX6_PTRADD-NEXT:s_mov_b32 s4, s0
+; GFX6_PTRADD-NEXT:s_mov_b32 s5, s1
+; GFX6_PTRADD-NEXT:v_add_i32_e32 v0, vcc, v2, v0
+; GFX6_PTRADD-NEXT:buffer_store_dword v0, off, s[4:7], 0
+; GFX6_PTRADD-NEXT:s_endpgm
+;
+; GFX6_LEGACY-LABEL: v_add_i32:
+; GFX6_LEGACY:   ; %bb.0:
+; GFX6_LEGACY-NEXT:s_load_dwordx4 s[0:3], s[8:9], 0x0
+; GFX6_LEGACY-NEXT:s_mov_b32 s7, 0x100f000
+; GFX6_LEGACY-NEXT:s_mov_b32 s10, 0
+; GFX6_LEGACY-NEXT:s_mov_b32 s11, s7
+; GFX6_LEGACY-NEXT:v_lshlrev_b32_e32 v0, 2, v0
+; GFX6_LEGACY-NEXT:s_waitcnt lgkmcnt(0)
+; GFX6_LEGACY-NEXT:s_mov_b64 s[8:9], s[2:3]
+; GFX6_LEGACY-NEXT:v_mov_b32_e32 v1, 0
+; GFX6_LEGACY-NEXT:buffer_load_dword v2, v[0:1], s[8:11], 0 addr64 glc
+; GFX6_LEGACY-NEXT:s_waitcnt vmcnt(0)
+; GFX6_LEGACY-NEXT:buffer_load_dword v0, v[0:1], s[8:11], 0 addr64 
offset:4 glc
+; GFX6_LEGACY-NEXT:s_waitcnt vmcnt(0)
+; GFX6_LEGACY-NEXT:s_mov_b32 s6, -1
+; GFX6_LEGACY-NEXT:s_mov_b32 s4, s0
+; GFX6_LEGACY-NEXT:s_mov_b32 s5, s1
+; GFX6_LEGACY-NEXT:v_add_i32_e32 v0, vcc, v2, v0
+; GFX6_LEGACY-NEXT:buffer_store_dword v0, off, s[4:7], 0
+; GFX6_LEGACY-NEXT:s_endpgm
+  %tid = call i32 @llvm.amdgcn.workitem.id.x()
+  %gep = getelementptr inbounds i32, ptr addrspace(1) %in, i32 %tid
+  %b_ptr = getelementptr i32, ptr addrspace(1) %gep, i32 1
+  %a = load volatile i32, ptr addrspace(1) %gep
+  %b = load volatile i32, ptr addrspace(1) %b_ptr
+  %result = add i32 %a, %b
+  store i32 %result, ptr addrspace(1) %out
+  ret void
+}
+
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add 
tests below this line:
+; GFX6: {{.*}}
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll 
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index 0fe4d337a5bd7..41e47e834b723 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -290,3 +290,209 @@ define ptr @fold_mul24_into_mad(ptr %base, i64 %a, i64 
%b) {
   %gep = getelementptr inbounds i8, ptr %base, i64 %mul
   ret ptr %gep
 }
+
+; Test PTRADD handling in AMDGPUDAGToDAGISel::SelectGlobalSAddr.
+define amdgpu_kernel void @uniform_base_varying_offset_imm(ptr addrspace(1) 
%p) {
+; GFX942_PTRADD-LABEL: uniform_base_varying_offset_imm:
+; GFX942_PTRADD:   ; %bb.0: ; %entry
+; GFX942_PTRADD-NEXT:s_load_dwordx2 s[0:1], s[4:5], 0x0
+; GFX942_PTRADD-NEXT:v_and_b32_e32 v0, 0x3ff, v0
+; GFX942_PTRADD-NEXT:v_mov_b32_e32 v1, 0
+; GFX942_PTRADD-NEXT:v_lshlrev_b32_e32 v0, 2, v0
+; GFX942_PTRADD-NEXT:v_mov_b32_e32 v2, 1
+; GFX942_PTRADD-NEXT:s_waitcnt lgkmcnt(0)
+; GFX942_PTRADD-NEXT:v_lshl_add_u64 v[0:1], s[0:1], 0, v[0:1]
+; GFX942_PTRAD

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR (PR #146075)

2025-09-12 Thread Fabian Ritter via llvm-branch-commits

https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/146075

>From 18dcde6a8c7bddfbd56077dc81b0b80535cc49a1 Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Fri, 27 Jun 2025 04:23:50 -0400
Subject: [PATCH 1/5] [AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR

If we can't fold a PTRADD's offset into its users, lowering them to
disjoint ORs is preferable: Often, a 32-bit OR instruction suffices
where we'd otherwise use a pair of 32-bit additions with carry.

This needs to be a DAGCombine (and not a selection rule) because its
main purpose is to enable subsequent DAGCombines for bitwise operations.
We don't want to just turn PTRADDs into disjoint ORs whenever that's
sound because this transform loses the information that the operation
implements pointer arithmetic, which we will soon need to fold offsets
into FLAT instructions. Currently, disjoint ORs can still be used for
offset folding, so that part of the logic can't be tested.

The PR contains a hacky workaround for a situation where an AssertAlign
operand of a PTRADD is not DAGCombined before the PTRADD, causing the
PTRADD to be turned into a disjoint OR although reassociating it with
the operand of the AssertAlign would be better. This wouldn't be a
problem if the DAGCombiner ensured that a node is only processed after
all its operands have been processed.

For SWDEV-516125.
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 35 
 .../AMDGPU/ptradd-sdag-optimizations.ll   | 56 ++-
 2 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index a1af50dac7e54..ec7002bdd9f43 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -15822,6 +15822,41 @@ SDValue SITargetLowering::performPtrAddCombine(SDNode 
*N,
   return Folded;
   }
 
+  // Transform (ptradd a, b) -> (or disjoint a, b) if it is equivalent and if
+  // that transformation can't block an offset folding at any use of the 
ptradd.
+  // This should be done late, after legalization, so that it doesn't block
+  // other ptradd combines that could enable more offset folding.
+  bool HasIntermediateAssertAlign =
+  N0->getOpcode() == ISD::AssertAlign && N0->getOperand(0)->isAnyAdd();
+  // This is a hack to work around an ordering problem for DAGs like this:
+  //   (ptradd (AssertAlign (ptradd p, c1), k), c2)
+  // If the outer ptradd is handled first by the DAGCombiner, it can be
+  // transformed into a disjoint or. Then, when the generic AssertAlign combine
+  // pushes the AssertAlign through the inner ptradd, it's too late for the
+  // ptradd reassociation to trigger.
+  if (!DCI.isBeforeLegalizeOps() && !HasIntermediateAssertAlign &&
+  DAG.haveNoCommonBitsSet(N0, N1)) {
+bool TransformCanBreakAddrMode = any_of(N->users(), [&](SDNode *User) {
+  if (auto *LoadStore = dyn_cast(User);
+  LoadStore && LoadStore->getBasePtr().getNode() == N) {
+unsigned AS = LoadStore->getAddressSpace();
+// Currently, we only really need ptradds to fold offsets into flat
+// memory instructions.
+if (AS != AMDGPUAS::FLAT_ADDRESS)
+  return false;
+TargetLoweringBase::AddrMode AM;
+AM.HasBaseReg = true;
+EVT VT = LoadStore->getMemoryVT();
+Type *AccessTy = VT.getTypeForEVT(*DAG.getContext());
+return isLegalAddressingMode(DAG.getDataLayout(), AM, AccessTy, AS);
+  }
+  return false;
+});
+
+if (!TransformCanBreakAddrMode)
+  return DAG.getNode(ISD::OR, DL, VT, N0, N1, SDNodeFlags::Disjoint);
+  }
+
   if (N1.getOpcode() != ISD::ADD || !N1.hasOneUse())
 return SDValue();
 
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll 
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index 199c1f61d2522..7d7fe141e5440 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -100,7 +100,7 @@ define void @baseptr_null(i64 %offset, i8 %v) {
 
 ; Taken from implicit-kernarg-backend-usage.ll, tests the PTRADD handling in 
the
 ; assertalign DAG combine.
-define amdgpu_kernel void @llvm_amdgcn_queue_ptr(ptr addrspace(1) %ptr)  #0 {
+define amdgpu_kernel void @llvm_amdgcn_queue_ptr(ptr addrspace(1) %ptr) {
 ; GFX942-LABEL: llvm_amdgcn_queue_ptr:
 ; GFX942:   ; %bb.0:
 ; GFX942-NEXT:v_mov_b32_e32 v0, 0
@@ -415,6 +415,60 @@ entry:
   ret void
 }
 
+; Check that ptradds can be lowered to disjoint ORs.
+define ptr @gep_disjoint_or(ptr %base) {
+; GFX942-LABEL: gep_disjoint_or:
+; GFX942:   ; %bb.0:
+; GFX942-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX942-NEXT:v_and_or_b32 v0, v0, -16, 4
+; GFX942-NEXT:s_setpc_b64 s[30:31]
+  %p = call ptr @llvm.ptrmask(ptr %base, i64 s0xf0)
+  %gep = getelementptr nuw inbounds i8, ptr %p, i64 4
+  ret ptr %gep
+}
+
+; Check that AssertAlign no

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Enable ISD::PTRADD for 64-bit AS by default (PR #146076)

2025-09-12 Thread Fabian Ritter via llvm-branch-commits

https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/146076

>From 8710de705f09d90f166f82c1733620b2c8581306 Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Fri, 27 Jun 2025 05:38:52 -0400
Subject: [PATCH 1/3] [AMDGPU][SDAG] Enable ISD::PTRADD for 64-bit AS by
 default

Also removes the command line option to control this feature.

There seem to be mainly two kinds of test changes:
- Some operands of addition instructions are swapped; that is to be expected
  since PTRADD is not commutative.
- Improvements in code generation, probably because the legacy lowering enabled
  some transformations that were sometimes harmful.

For SWDEV-516125.
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |  10 +-
 .../identical-subrange-spill-infloop.ll   | 352 +++---
 .../AMDGPU/infer-addrspace-flat-atomic.ll |  14 +-
 llvm/test/CodeGen/AMDGPU/lds-frame-extern.ll  |   8 +-
 .../AMDGPU/lower-module-lds-via-hybrid.ll |   4 +-
 .../AMDGPU/lower-module-lds-via-table.ll  |  16 +-
 .../match-perm-extract-vector-elt-bug.ll  |  22 +-
 llvm/test/CodeGen/AMDGPU/memmove-var-size.ll  |  16 +-
 .../AMDGPU/preload-implicit-kernargs.ll   |   6 +-
 .../AMDGPU/promote-constOffset-to-imm.ll  |   8 +-
 llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll |   7 +-
 .../AMDGPU/ptradd-sdag-optimizations.ll   |  94 ++---
 .../AMDGPU/ptradd-sdag-undef-poison.ll|   6 +-
 llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll   |  27 +-
 llvm/test/CodeGen/AMDGPU/store-weird-sizes.ll |  29 +-
 15 files changed, 310 insertions(+), 309 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index a1af50dac7e54..05ab745171f6d 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -63,14 +63,6 @@ static cl::opt UseDivergentRegisterIndexing(
 cl::desc("Use indirect register addressing for divergent indexes"),
 cl::init(false));
 
-// TODO: This option should be removed once we switch to always using PTRADD in
-// the SelectionDAG.
-static cl::opt UseSelectionDAGPTRADD(
-"amdgpu-use-sdag-ptradd", cl::Hidden,
-cl::desc("Generate ISD::PTRADD nodes for 64-bit pointer arithmetic in the "
- "SelectionDAG ISel"),
-cl::init(false));
-
 static bool denormalModeIsFlushAllF32(const MachineFunction &MF) {
   const SIMachineFunctionInfo *Info = MF.getInfo();
   return Info->getMode().FP32Denormals == DenormalMode::getPreserveSign();
@@ -11252,7 +11244,7 @@ static bool isNoUnsignedWrap(SDValue Addr) {
 
 bool SITargetLowering::shouldPreservePtrArith(const Function &F,
   EVT PtrVT) const {
-  return UseSelectionDAGPTRADD && PtrVT == MVT::i64;
+  return PtrVT == MVT::i64;
 }
 
 bool SITargetLowering::canTransformPtrArithOutOfBounds(const Function &F,
diff --git a/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll 
b/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll
index 2c03113e8af47..805cdd37d6e70 100644
--- a/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll
+++ b/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll
@@ -6,96 +6,150 @@ define void @main(i1 %arg) #0 {
 ; CHECK:   ; %bb.0: ; %bb
 ; CHECK-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; CHECK-NEXT:s_xor_saveexec_b64 s[4:5], -1
-; CHECK-NEXT:buffer_store_dword v5, off, s[0:3], s32 ; 4-byte Folded Spill
-; CHECK-NEXT:buffer_store_dword v6, off, s[0:3], s32 offset:4 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword v6, off, s[0:3], s32 ; 4-byte Folded Spill
+; CHECK-NEXT:buffer_store_dword v7, off, s[0:3], s32 offset:4 ; 4-byte 
Folded Spill
 ; CHECK-NEXT:s_mov_b64 exec, s[4:5]
-; CHECK-NEXT:v_writelane_b32 v5, s30, 0
-; CHECK-NEXT:v_writelane_b32 v5, s31, 1
-; CHECK-NEXT:v_writelane_b32 v5, s36, 2
-; CHECK-NEXT:v_writelane_b32 v5, s37, 3
-; CHECK-NEXT:v_writelane_b32 v5, s38, 4
-; CHECK-NEXT:v_writelane_b32 v5, s39, 5
-; CHECK-NEXT:v_writelane_b32 v5, s48, 6
-; CHECK-NEXT:v_writelane_b32 v5, s49, 7
-; CHECK-NEXT:v_writelane_b32 v5, s50, 8
-; CHECK-NEXT:v_writelane_b32 v5, s51, 9
-; CHECK-NEXT:v_writelane_b32 v5, s52, 10
-; CHECK-NEXT:v_writelane_b32 v5, s53, 11
-; CHECK-NEXT:v_writelane_b32 v5, s54, 12
-; CHECK-NEXT:v_writelane_b32 v5, s55, 13
-; CHECK-NEXT:s_getpc_b64 s[24:25]
-; CHECK-NEXT:v_writelane_b32 v5, s64, 14
-; CHECK-NEXT:s_movk_i32 s4, 0xf0
-; CHECK-NEXT:s_mov_b32 s5, s24
-; CHECK-NEXT:v_writelane_b32 v5, s65, 15
-; CHECK-NEXT:s_load_dwordx16 s[8:23], s[4:5], 0x0
-; CHECK-NEXT:s_mov_b64 s[4:5], 0
-; CHECK-NEXT:v_writelane_b32 v5, s66, 16
-; CHECK-NEXT:s_load_dwordx4 s[4:7], s[4:5], 0x0
-; CHECK-NEXT:v_writelane_b32 v5, s67, 17
-; CHECK-NEXT:s_waitcnt lgkmcnt(0)
-; CHECK-NEXT:s_movk_i32 s6, 0x130
-; CHECK-NEXT:s_mov_b32 s7, s24
-; CHECK-NEXT:v_writelane_b32 v5

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR (PR #146075)

2025-09-12 Thread Fabian Ritter via llvm-branch-commits

https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/146075

>From 18dcde6a8c7bddfbd56077dc81b0b80535cc49a1 Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Fri, 27 Jun 2025 04:23:50 -0400
Subject: [PATCH 1/5] [AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR

If we can't fold a PTRADD's offset into its users, lowering them to
disjoint ORs is preferable: Often, a 32-bit OR instruction suffices
where we'd otherwise use a pair of 32-bit additions with carry.

This needs to be a DAGCombine (and not a selection rule) because its
main purpose is to enable subsequent DAGCombines for bitwise operations.
We don't want to just turn PTRADDs into disjoint ORs whenever that's
sound because this transform loses the information that the operation
implements pointer arithmetic, which we will soon need to fold offsets
into FLAT instructions. Currently, disjoint ORs can still be used for
offset folding, so that part of the logic can't be tested.

The PR contains a hacky workaround for a situation where an AssertAlign
operand of a PTRADD is not DAGCombined before the PTRADD, causing the
PTRADD to be turned into a disjoint OR although reassociating it with
the operand of the AssertAlign would be better. This wouldn't be a
problem if the DAGCombiner ensured that a node is only processed after
all its operands have been processed.

For SWDEV-516125.
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 35 
 .../AMDGPU/ptradd-sdag-optimizations.ll   | 56 ++-
 2 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index a1af50dac7e54..ec7002bdd9f43 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -15822,6 +15822,41 @@ SDValue SITargetLowering::performPtrAddCombine(SDNode 
*N,
   return Folded;
   }
 
+  // Transform (ptradd a, b) -> (or disjoint a, b) if it is equivalent and if
+  // that transformation can't block an offset folding at any use of the 
ptradd.
+  // This should be done late, after legalization, so that it doesn't block
+  // other ptradd combines that could enable more offset folding.
+  bool HasIntermediateAssertAlign =
+  N0->getOpcode() == ISD::AssertAlign && N0->getOperand(0)->isAnyAdd();
+  // This is a hack to work around an ordering problem for DAGs like this:
+  //   (ptradd (AssertAlign (ptradd p, c1), k), c2)
+  // If the outer ptradd is handled first by the DAGCombiner, it can be
+  // transformed into a disjoint or. Then, when the generic AssertAlign combine
+  // pushes the AssertAlign through the inner ptradd, it's too late for the
+  // ptradd reassociation to trigger.
+  if (!DCI.isBeforeLegalizeOps() && !HasIntermediateAssertAlign &&
+  DAG.haveNoCommonBitsSet(N0, N1)) {
+bool TransformCanBreakAddrMode = any_of(N->users(), [&](SDNode *User) {
+  if (auto *LoadStore = dyn_cast(User);
+  LoadStore && LoadStore->getBasePtr().getNode() == N) {
+unsigned AS = LoadStore->getAddressSpace();
+// Currently, we only really need ptradds to fold offsets into flat
+// memory instructions.
+if (AS != AMDGPUAS::FLAT_ADDRESS)
+  return false;
+TargetLoweringBase::AddrMode AM;
+AM.HasBaseReg = true;
+EVT VT = LoadStore->getMemoryVT();
+Type *AccessTy = VT.getTypeForEVT(*DAG.getContext());
+return isLegalAddressingMode(DAG.getDataLayout(), AM, AccessTy, AS);
+  }
+  return false;
+});
+
+if (!TransformCanBreakAddrMode)
+  return DAG.getNode(ISD::OR, DL, VT, N0, N1, SDNodeFlags::Disjoint);
+  }
+
   if (N1.getOpcode() != ISD::ADD || !N1.hasOneUse())
 return SDValue();
 
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll 
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index 199c1f61d2522..7d7fe141e5440 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -100,7 +100,7 @@ define void @baseptr_null(i64 %offset, i8 %v) {
 
 ; Taken from implicit-kernarg-backend-usage.ll, tests the PTRADD handling in 
the
 ; assertalign DAG combine.
-define amdgpu_kernel void @llvm_amdgcn_queue_ptr(ptr addrspace(1) %ptr)  #0 {
+define amdgpu_kernel void @llvm_amdgcn_queue_ptr(ptr addrspace(1) %ptr) {
 ; GFX942-LABEL: llvm_amdgcn_queue_ptr:
 ; GFX942:   ; %bb.0:
 ; GFX942-NEXT:v_mov_b32_e32 v0, 0
@@ -415,6 +415,60 @@ entry:
   ret void
 }
 
+; Check that ptradds can be lowered to disjoint ORs.
+define ptr @gep_disjoint_or(ptr %base) {
+; GFX942-LABEL: gep_disjoint_or:
+; GFX942:   ; %bb.0:
+; GFX942-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX942-NEXT:v_and_or_b32 v0, v0, -16, 4
+; GFX942-NEXT:s_setpc_b64 s[30:31]
+  %p = call ptr @llvm.ptrmask(ptr %base, i64 s0xf0)
+  %gep = getelementptr nuw inbounds i8, ptr %p, i64 4
+  ret ptr %gep
+}
+
+; Check that AssertAlign no

[llvm-branch-commits] [llvm] release/21.x: [VPlan] Don't narrow op multiple times in narrowInterleaveGroups. (PR #158013)

2025-09-12 Thread Nikita Popov via llvm-branch-commits

https://github.com/nikic edited https://github.com/llvm/llvm-project/pull/158013
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [NFC][flang][do concurent] Add saxpy offload tests for OpenMP mapping (PR #155993)

2025-09-12 Thread Kareem Ergawy via llvm-branch-commits

https://github.com/ergawy edited 
https://github.com/llvm/llvm-project/pull/155993
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] 4a4cc8c - Revert "Introduce LDBG_OS() macro as a variant of LDBG() (#157194)"

2025-09-12 Thread via llvm-branch-commits

Author: Mehdi Amini
Date: 2025-09-11T13:33:57+01:00
New Revision: 4a4cc8c0fcec63de73d6b14d258204593c181b79

URL: 
https://github.com/llvm/llvm-project/commit/4a4cc8c0fcec63de73d6b14d258204593c181b79
DIFF: 
https://github.com/llvm/llvm-project/commit/4a4cc8c0fcec63de73d6b14d258204593c181b79.diff

LOG: Revert "Introduce LDBG_OS() macro as a variant of LDBG() (#157194)"

This reverts commit c84f34bcd8c7fb6d5038b3f52da8c7be64ad5189.

Added: 


Modified: 
llvm/include/llvm/Support/Debug.h
llvm/include/llvm/Support/DebugLog.h
llvm/unittests/Support/DebugLogTest.cpp
mlir/lib/Dialect/Transform/IR/TransformOps.cpp

Removed: 




diff  --git a/llvm/include/llvm/Support/Debug.h 
b/llvm/include/llvm/Support/Debug.h
index b73f2d7c8b852..a7795d403721c 100644
--- a/llvm/include/llvm/Support/Debug.h
+++ b/llvm/include/llvm/Support/Debug.h
@@ -44,6 +44,11 @@ class raw_ostream;
 /// level, return false.
 LLVM_ABI bool isCurrentDebugType(const char *Type, int Level = 0);
 
+/// Overload allowing to swap the order of the Type and Level arguments.
+LLVM_ABI inline bool isCurrentDebugType(int Level, const char *Type) {
+  return isCurrentDebugType(Type, Level);
+}
+
 /// setCurrentDebugType - Set the current debug type, as if the -debug-only=X
 /// option were specified.  Note that DebugFlag also needs to be set to true 
for
 /// debug output to be produced.

diff  --git a/llvm/include/llvm/Support/DebugLog.h 
b/llvm/include/llvm/Support/DebugLog.h
index 33586dd275573..dce706e196bde 100644
--- a/llvm/include/llvm/Support/DebugLog.h
+++ b/llvm/include/llvm/Support/DebugLog.h
@@ -19,55 +19,52 @@
 namespace llvm {
 #ifndef NDEBUG
 
-/// LDBG() is a macro that can be used as a raw_ostream for debugging.
-/// It will stream the output to the dbgs() stream, with a prefix of the
-/// debug type and the file and line number. A trailing newline is added to the
-/// output automatically. If the streamed content contains a newline, the 
prefix
-/// is added to each beginning of a new line. Nothing is printed if the debug
-/// output is not enabled or the debug type does not match.
-///
-/// E.g.,
-///   LDBG() << "Bitset contains: " << Bitset;
-/// is equivalent to
-///   LLVM_DEBUG(dbgs() << "[" << DEBUG_TYPE << "] " << __FILE__ << ":" <<
-///   __LINE__ << " "
-///  << "Bitset contains: " << Bitset << "\n");
-///
+// LDBG() is a macro that can be used as a raw_ostream for debugging.
+// It will stream the output to the dbgs() stream, with a prefix of the
+// debug type and the file and line number. A trailing newline is added to the
+// output automatically. If the streamed content contains a newline, the prefix
+// is added to each beginning of a new line. Nothing is printed if the debug
+// output is not enabled or the debug type does not match.
+//
+// E.g.,
+//   LDBG() << "Bitset contains: " << Bitset;
+// is somehow equivalent to
+//   LLVM_DEBUG(dbgs() << "[" << DEBUG_TYPE << "] " << __FILE__ << ":" <<
+//   __LINE__ << " "
+//  << "Bitset contains: " << Bitset << "\n");
+//
 // An optional `level` argument can be provided to control the verbosity of the
-/// output. The default level is 1, and is in increasing level of verbosity.
-///
-/// The `level` argument can be a literal integer, or a macro that evaluates to
-/// an integer.
-///
-/// An optional `type` argument can be provided to control the debug type. The
-/// default type is DEBUG_TYPE. The `type` argument can be a literal string, or
-/// a macro that evaluates to a string.
-///
-/// E.g.,
-///   LDBG(2) << "Bitset contains: " << Bitset;
-///   LDBG("debug_type") << "Bitset contains: " << Bitset;
-///   LDBG("debug_type", 2) << "Bitset contains: " << Bitset;
+// output. The default level is 1, and is in increasing level of verbosity.
+//
+// The `level` argument can be a literal integer, or a macro that evaluates to
+// an integer.
+//
+// An optional `type` argument can be provided to control the debug type. The
+// default type is DEBUG_TYPE. The `type` argument can be a literal string, or 
a
+// macro that evaluates to a string.
 #define LDBG(...) _GET_LDBG_MACRO(__VA_ARGS__)(__VA_ARGS__)
 
-/// LDBG_OS() is a macro that behaves like LDBG() but instead of directly using
-/// it to stream the output, it takes a callback function that will be called
-/// with a raw_ostream.
-/// This is useful when you need to pass a `raw_ostream` to a helper function 
to
-/// be able to print (when the `<<` operator is not available).
-///
-/// E.g.,
-///   LDBG_OS([&] (raw_ostream &Os) {
-/// Os << "Pass Manager contains: ";
-/// pm.printAsTextual(Os);
-///   });
-///
-/// Just like LDBG(), it optionally accepts a `level` and `type` arguments.
-/// E.g.,
-///   LDBG_OS(2, [&] (raw_ostream &Os) { ... });
-///   LDBG_OS("debug_type", [&] (raw_ostream &Os) { ... });
-///   LDBG_OS("debug_type", 2, [&] (raw_ostream &Os) { ... });
-///
-#define LDBG_OS(

[llvm-branch-commits] [mlir] [mlir][Transforms] Simplify `ConversionPatternRewriter::replaceOp` implementation (PR #158075)

2025-09-12 Thread Matthias Springer via llvm-branch-commits

https://github.com/matthias-springer created 
https://github.com/llvm/llvm-project/pull/158075

Depends on #158067.


>From 8113b1d6c7600dec5ccf93d6c3fe356c08dbc067 Mon Sep 17 00:00:00 2001
From: Matthias Springer 
Date: Wed, 3 Sep 2025 07:35:47 +
Subject: [PATCH] proto

---
 .../Transforms/Utils/DialectConversion.cpp| 52 +++
 1 file changed, 20 insertions(+), 32 deletions(-)

diff --git a/mlir/lib/Transforms/Utils/DialectConversion.cpp 
b/mlir/lib/Transforms/Utils/DialectConversion.cpp
index 4b483c32ecef9..52369c18faa61 100644
--- a/mlir/lib/Transforms/Utils/DialectConversion.cpp
+++ b/mlir/lib/Transforms/Utils/DialectConversion.cpp
@@ -1618,6 +1618,8 @@ Block 
*ConversionPatternRewriterImpl::applySignatureConversion(
 if (!inputMap) {
   // This block argument was dropped and no replacement value was provided.
   // Materialize a replacement value "out of thin air".
+  // Note: Materialization must be built here because we cannot find a
+  // valid insertion point in the new block. (Will point to the old block.)
   Value mat =
   buildUnresolvedMaterialization(
   MaterializationKind::Source,
@@ -1709,8 +1711,9 @@ Value 
ConversionPatternRewriterImpl::findOrBuildReplacementValue(
   // mapping. This includes cached materializations. We try to reuse those
   // instead of generating duplicate IR.
   ValueVector repl = lookupOrNull(value, value.getType());
-  if (!repl.empty())
+  if (!repl.empty()) {
 return repl.front();
+  }
 
   // Check if the value is dead. No replacement value is needed in that case.
   // This is an approximate check that may have false negatives but does not
@@ -1718,22 +1721,14 @@ Value 
ConversionPatternRewriterImpl::findOrBuildReplacementValue(
   // building source materializations that are never used and that fold away.)
   if (llvm::all_of(value.getUsers(),
[&](Operation *op) { return replacedOps.contains(op); }) &&
-  !mapping.isMappedTo(value))
+  !mapping.isMappedTo(value)) {
 return Value();
+  }
 
   // No replacement value was found. Get the latest replacement value
   // (regardless of the type) and build a source materialization to the
   // original type.
   repl = lookupOrNull(value);
-  if (repl.empty()) {
-// No replacement value is registered in the mapping. This means that the
-// value is dropped and no longer needed. (If the value were still needed,
-// a source materialization producing a replacement value "out of thin air"
-// would have already been created during `replaceOp` or
-// `applySignatureConversion`.)
-return Value();
-  }
-
   // Note: `computeInsertPoint` computes the "earliest" insertion point at
   // which all values in `repl` are defined. It is important to emit the
   // materialization at that location because the same materialization may be
@@ -1741,13 +1736,19 @@ Value 
ConversionPatternRewriterImpl::findOrBuildReplacementValue(
   // in the conversion value mapping.) The insertion point of the
   // materialization must be valid for all future users that may be created
   // later in the conversion process.
-  Value castValue =
-  buildUnresolvedMaterialization(MaterializationKind::Source,
- computeInsertPoint(repl), value.getLoc(),
- /*valuesToMap=*/repl, /*inputs=*/repl,
- /*outputTypes=*/value.getType(),
- /*originalType=*/Type(), converter)
-  .front();
+  OpBuilder::InsertPoint ip;
+  if (repl.empty()) {
+ip = computeInsertPoint(value);
+  } else {
+ip = computeInsertPoint(repl);
+  }
+  Value castValue = buildUnresolvedMaterialization(
+MaterializationKind::Source, ip, value.getLoc(),
+/*valuesToMap=*/repl, /*inputs=*/repl,
+/*outputTypes=*/value.getType(),
+/*originalType=*/Type(), converter,
+/*isPureTypeConversion=*/!repl.empty())
+.front();
   return castValue;
 }
 
@@ -1897,21 +1898,8 @@ void ConversionPatternRewriterImpl::replaceOp(
   }
 
   // Create mappings for each of the new result values.
-  for (auto [repl, result] : llvm::zip_equal(newValues, op->getResults())) {
-if (repl.empty()) {
-  // This result was dropped and no replacement value was provided.
-  // Materialize a replacement value "out of thin air".
-  buildUnresolvedMaterialization(
-  MaterializationKind::Source, computeInsertPoint(result),
-  result.getLoc(), /*valuesToMap=*/{result}, /*inputs=*/ValueRange(),
-  /*outputTypes=*/result.getType(), /*originalType=*/Type(),
-  currentTypeConverter, /*isPureTypeConversion=*/false);
-  continue;
-}
-
-// Remap result to replacement value.
+  for (auto [repl, result] : llvm::zip_equal(newValues, op->getResults()))
 mapp

[llvm-branch-commits] [llvm] [AMDGPU][Attributor] Add `AAAMDGPUClusterDims` (PR #158076)

2025-09-12 Thread Shilei Tian via llvm-branch-commits

shiltian wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/158076?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#158076** https://app.graphite.dev/github/pr/llvm/llvm-project/158076?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/158076?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#157978** https://app.graphite.dev/github/pr/llvm/llvm-project/157978?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/158076
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [Remarks] BitstreamRemarkParser: Refactor error handling (PR #156511)

2025-09-12 Thread Jon Roelofs via llvm-branch-commits

https://github.com/jroelofs approved this pull request.

LGTM with some nits. Tests would be good, but I don't think we should block 
this on improving things there.

https://github.com/llvm/llvm-project/pull/156511
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [Remarks] Restructure bitstream remarks to be fully standalone (PR #156715)

2025-09-12 Thread Jon Roelofs via llvm-branch-commits


@@ -82,20 +82,26 @@ struct LLVMRemarkSetupFormatError
   LLVMRemarkSetupFormatError>::LLVMRemarkSetupErrorInfo;
 };
 
-/// Setup optimization remarks that output to a file.
+/// Setup optimization remarks that output to a file. The returned
+/// ToolOutputFile must be kept open for writing until
+/// \ref finalizeLLVMOptimizationRemarks() is called.
 LLVM_ABI Expected> 
setupLLVMOptimizationRemarks(
 LLVMContext &Context, StringRef RemarksFilename, StringRef RemarksPasses,
 StringRef RemarksFormat, bool RemarksWithHotness,
 std::optional RemarksHotnessThreshold = 0);
 
 /// Setup optimization remarks that output directly to a raw_ostream.
-/// \p OS is managed by the caller and should be open for writing as long as \p
-/// Context is streaming remarks to it.
+/// \p OS is managed by the caller and must be open for writing until
+/// \ref finalizeLLVMOptimizationRemarks() is called.
 LLVM_ABI Error setupLLVMOptimizationRemarks(
 LLVMContext &Context, raw_ostream &OS, StringRef RemarksPasses,
 StringRef RemarksFormat, bool RemarksWithHotness,
 std::optional RemarksHotnessThreshold = 0);
 
+/// Finalize optimization remarks. This must be called before closing the
+/// (file) stream that was used to setup the remarks.
+LLVM_ABI void finalizeLLVMOptimizationRemarks(LLVMContext &Context);

jroelofs wrote:

Does the "resource" that this closes out have the same lifetime as the 
`ToolOutputFile`? If so, maybe this API could be simplified by moving this 
finalization into a subclass's dtor then you can't forget it.

https://github.com/llvm/llvm-project/pull/156715
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][Attributor] Add `AAAMDGPUClusterDims` (PR #158076)

2025-09-12 Thread Matt Arsenault via llvm-branch-commits


@@ -1296,6 +1303,157 @@ struct AAAMDGPUNoAGPR
 
 const char AAAMDGPUNoAGPR::ID = 0;
 
+/// An abstract attribute to propagate the function attribute
+/// "amdgpu-cluster-dims" from kernel entry functions to device functions.
+struct AAAMDGPUClusterDims
+: public StateWrapper {
+  using Base = StateWrapper;
+  AAAMDGPUClusterDims(const IRPosition &IRP, Attributor &A) : Base(IRP) {}
+
+  /// Create an abstract attribute view for the position \p IRP.
+  static AAAMDGPUClusterDims &createForPosition(const IRPosition &IRP,
+Attributor &A);
+
+  /// See AbstractAttribute::getName().
+  StringRef getName() const override { return "AAAMDGPUClusterDims"; }
+
+  /// See AbstractAttribute::getIdAddr().
+  const char *getIdAddr() const override { return &ID; }
+
+  /// This function should return true if the type of the \p AA is
+  /// AAAMDGPUClusterDims.
+  static bool classof(const AbstractAttribute *AA) {
+return (AA->getIdAddr() == &ID);
+  }
+
+  virtual const AMDGPU::ClusterDimsAttr &getClusterDims() const = 0;
+
+  /// Unique ID (due to the unique address)
+  static const char ID;
+};
+
+const char AAAMDGPUClusterDims::ID = 0;
+
+struct AAAMDGPUClusterDimsFunction : public AAAMDGPUClusterDims {
+  AAAMDGPUClusterDimsFunction(const IRPosition &IRP, Attributor &A)
+  : AAAMDGPUClusterDims(IRP, A) {}
+
+  void initialize(Attributor &A) override {
+Function *F = getAssociatedFunction();
+assert(F && "empty associated function");
+
+Attr = AMDGPU::ClusterDimsAttr::get(*F);
+
+// No matter what a kernel function has, it is final.
+if (AMDGPU::isEntryFunctionCC(F->getCallingConv())) {
+  if (Attr.isUnknown())
+indicatePessimisticFixpoint();
+  else
+indicateOptimisticFixpoint();
+}
+  }
+
+  const std::string getAsStr(Attributor *A) const override {
+if (!getAssumed() || Attr.isUnknown())
+  return "unknown";
+if (Attr.isNoCluster())
+  return "no";
+if (Attr.isVariableedDims())

arsenm wrote:

Find and replace typo? "isVariableedDims"

https://github.com/llvm/llvm-project/pull/158076
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR (PR #146075)

2025-09-12 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/146075
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [Remarks] Restructure bitstream remarks to be fully standalone (PR #156715)

2025-09-12 Thread Jon Roelofs via llvm-branch-commits


@@ -232,43 +221,40 @@ void BitstreamRemarkSerializerHelper::setupBlockInfo() {
 }
 
 void BitstreamRemarkSerializerHelper::emitMetaBlock(
-uint64_t ContainerVersion, std::optional RemarkVersion,
-std::optional StrTab,
 std::optional Filename) {
   // Emit the meta block
   Bitstream.EnterSubblock(META_BLOCK_ID, 3);
 
   // The container version and type.
   R.clear();
   R.push_back(RECORD_META_CONTAINER_INFO);
-  R.push_back(ContainerVersion);
+  R.push_back(CurrentContainerVersion);
   R.push_back(static_cast(ContainerType));
   Bitstream.EmitRecordWithAbbrev(RecordMetaContainerInfoAbbrevID, R);
 
   switch (ContainerType) {
-  case BitstreamRemarkContainerType::SeparateRemarksMeta:
-assert(StrTab != std::nullopt && *StrTab != nullptr);
-emitMetaStrTab(**StrTab);
+  case BitstreamRemarkContainerType::RemarksFileExternal:
 assert(Filename != std::nullopt);
 emitMetaExternalFile(*Filename);
 break;
-  case BitstreamRemarkContainerType::SeparateRemarksFile:
-assert(RemarkVersion != std::nullopt);
-emitMetaRemarkVersion(*RemarkVersion);
-break;
-  case BitstreamRemarkContainerType::Standalone:
-assert(RemarkVersion != std::nullopt);
-emitMetaRemarkVersion(*RemarkVersion);
-assert(StrTab != std::nullopt && *StrTab != nullptr);
-emitMetaStrTab(**StrTab);
+  case BitstreamRemarkContainerType::RemarksFile:
+emitMetaRemarkVersion(CurrentRemarkVersion);
 break;
   }
 
   Bitstream.ExitBlock();

jroelofs wrote:

likewise

https://github.com/llvm/llvm-project/pull/156715
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [lit] Implement ulimit builtin (PR #157958)

2025-09-12 Thread via llvm-branch-commits

https://github.com/cmtice approved this pull request.


https://github.com/llvm/llvm-project/pull/157958
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [DA] Fix Strong SIV test for symbolic coefficients and deltas (#149977) (PR #157738)

2025-09-12 Thread Ryotaro Kasuga via llvm-branch-commits

kasuga-fj wrote:

Not yet. Feel free to go ahead if you’d like.

https://github.com/llvm/llvm-project/pull/157738
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang][OpenMP] `do concurrent`: support `reduce` on device (PR #156610)

2025-09-12 Thread Michael Klemm via llvm-branch-commits

https://github.com/mjklemm approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/156610
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [mlir][Transforms] Simplify `ConversionPatternRewriter::replaceOp` implementation (PR #158075)

2025-09-12 Thread Matthias Springer via llvm-branch-commits

https://github.com/matthias-springer updated 
https://github.com/llvm/llvm-project/pull/158075

>From d7d40567d7c5aa55210d965f01773fbc535e50ee Mon Sep 17 00:00:00 2001
From: Matthias Springer 
Date: Wed, 3 Sep 2025 07:35:47 +
Subject: [PATCH] proto

---
 .../Transforms/Utils/DialectConversion.cpp| 46 +++
 1 file changed, 16 insertions(+), 30 deletions(-)

diff --git a/mlir/lib/Transforms/Utils/DialectConversion.cpp 
b/mlir/lib/Transforms/Utils/DialectConversion.cpp
index 4b483c32ecef9..65bed7d85ec66 100644
--- a/mlir/lib/Transforms/Utils/DialectConversion.cpp
+++ b/mlir/lib/Transforms/Utils/DialectConversion.cpp
@@ -1618,6 +1618,8 @@ Block 
*ConversionPatternRewriterImpl::applySignatureConversion(
 if (!inputMap) {
   // This block argument was dropped and no replacement value was provided.
   // Materialize a replacement value "out of thin air".
+  // Note: Materialization must be built here because we cannot find a
+  // valid insertion point in the new block. (Will point to the old block.)
   Value mat =
   buildUnresolvedMaterialization(
   MaterializationKind::Source,
@@ -1725,15 +1727,6 @@ Value 
ConversionPatternRewriterImpl::findOrBuildReplacementValue(
   // (regardless of the type) and build a source materialization to the
   // original type.
   repl = lookupOrNull(value);
-  if (repl.empty()) {
-// No replacement value is registered in the mapping. This means that the
-// value is dropped and no longer needed. (If the value were still needed,
-// a source materialization producing a replacement value "out of thin air"
-// would have already been created during `replaceOp` or
-// `applySignatureConversion`.)
-return Value();
-  }
-
   // Note: `computeInsertPoint` computes the "earliest" insertion point at
   // which all values in `repl` are defined. It is important to emit the
   // materialization at that location because the same materialization may be
@@ -1741,13 +1734,19 @@ Value 
ConversionPatternRewriterImpl::findOrBuildReplacementValue(
   // in the conversion value mapping.) The insertion point of the
   // materialization must be valid for all future users that may be created
   // later in the conversion process.
-  Value castValue =
-  buildUnresolvedMaterialization(MaterializationKind::Source,
- computeInsertPoint(repl), value.getLoc(),
- /*valuesToMap=*/repl, /*inputs=*/repl,
- /*outputTypes=*/value.getType(),
- /*originalType=*/Type(), converter)
-  .front();
+  OpBuilder::InsertPoint ip;
+  if (repl.empty()) {
+ip = computeInsertPoint(value);
+  } else {
+ip = computeInsertPoint(repl);
+  }
+  Value castValue = buildUnresolvedMaterialization(
+MaterializationKind::Source, ip, value.getLoc(),
+/*valuesToMap=*/repl, /*inputs=*/repl,
+/*outputTypes=*/value.getType(),
+/*originalType=*/Type(), converter,
+/*isPureTypeConversion=*/!repl.empty())
+.front();
   return castValue;
 }
 
@@ -1897,21 +1896,8 @@ void ConversionPatternRewriterImpl::replaceOp(
   }
 
   // Create mappings for each of the new result values.
-  for (auto [repl, result] : llvm::zip_equal(newValues, op->getResults())) {
-if (repl.empty()) {
-  // This result was dropped and no replacement value was provided.
-  // Materialize a replacement value "out of thin air".
-  buildUnresolvedMaterialization(
-  MaterializationKind::Source, computeInsertPoint(result),
-  result.getLoc(), /*valuesToMap=*/{result}, /*inputs=*/ValueRange(),
-  /*outputTypes=*/result.getType(), /*originalType=*/Type(),
-  currentTypeConverter, /*isPureTypeConversion=*/false);
-  continue;
-}
-
-// Remap result to replacement value.
+  for (auto [repl, result] : llvm::zip_equal(newValues, op->getResults()))
 mapping.map(static_cast(result), std::move(repl));
-  }
 
   appendRewrite(op, currentTypeConverter);
   // Mark this operation and all nested ops as replaced.

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [DA] Fix Strong SIV test for symbolic coefficients and deltas (#149977) (PR #157738)

2025-09-12 Thread Ryotaro Kasuga via llvm-branch-commits

https://github.com/kasuga-fj edited 
https://github.com/llvm/llvm-project/pull/157738
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [libc++] Test triggering a benchmarking job comment (PR #158138)

2025-09-12 Thread Louis Dionne via llvm-branch-commits

ldionne wrote:

/libcxx-bot benchmark libcxx/test/benchmarks/join_view.bench.cpp 
libcxx/test/benchmarks/hash.bench.cpp

https://github.com/llvm/llvm-project/pull/158138
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Add IR and codegen support for deactivation symbols. (PR #133536)

2025-09-12 Thread Peter Collingbourne via llvm-branch-commits

https://github.com/pcc updated https://github.com/llvm/llvm-project/pull/133536

>From f4c61b403c8a2c649741bae983196922143db44e Mon Sep 17 00:00:00 2001
From: Peter Collingbourne 
Date: Wed, 10 Sep 2025 18:02:38 -0700
Subject: [PATCH 1/2] Tweak LangRef

Created using spr 1.3.6-beta.1
---
 llvm/docs/LangRef.rst | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 10586f03cff8e..5380413aec892 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -3098,7 +3098,8 @@ Deactivation Symbol Operand Bundles
 A ``"deactivation-symbol"`` operand bundle is valid on the following
 instructions (AArch64 only):
 
-- Call to a normal function with ``notail`` attribute.
+- Call to a normal function with ``notail`` attribute and a first argument and
+  return value of type ``ptr``.
 - Call to ``llvm.ptrauth.sign`` or ``llvm.ptrauth.auth`` intrinsics.
 
 This operand bundle specifies that if the deactivation symbol is defined

>From 0c2d97be43360d18f6e674bde048298a450a4bda Mon Sep 17 00:00:00 2001
From: Peter Collingbourne 
Date: Thu, 11 Sep 2025 12:39:33 -0700
Subject: [PATCH 2/2] Add combine check

Created using spr 1.3.6-beta.1
---
 .../InstCombine/InstCombineCalls.cpp  | 10 +++
 .../InstCombine/ptrauth-intrinsics.ll | 28 +++
 2 files changed, 38 insertions(+)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp 
b/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
index 42b65dde67255..6550c6213dee5 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
@@ -3052,6 +3052,11 @@ Instruction *InstCombinerImpl::visitCallInst(CallInst 
&CI) {
   }
   case Intrinsic::ptrauth_auth:
   case Intrinsic::ptrauth_resign: {
+// We don't support this optimization on intrinsic calls with deactivation
+// symbols, which are represented using operand bundles.
+if (II->hasOperandBundles())
+  break;
+
 // (sign|resign) + (auth|resign) can be folded by omitting the middle
 // sign+auth component if the key and discriminator match.
 bool NeedSign = II->getIntrinsicID() == Intrinsic::ptrauth_resign;
@@ -3063,6 +3068,11 @@ Instruction *InstCombinerImpl::visitCallInst(CallInst 
&CI) {
 // whatever we replace this sequence with.
 Value *AuthKey = nullptr, *AuthDisc = nullptr, *BasePtr;
 if (const auto *CI = dyn_cast(Ptr)) {
+  // We don't support this optimization on intrinsic calls with 
deactivation
+  // symbols, which are represented using operand bundles.
+  if (CI->hasOperandBundles())
+break;
+
   BasePtr = CI->getArgOperand(0);
   if (CI->getIntrinsicID() == Intrinsic::ptrauth_sign) {
 if (CI->getArgOperand(1) != Key || CI->getArgOperand(2) != Disc)
diff --git a/llvm/test/Transforms/InstCombine/ptrauth-intrinsics.ll 
b/llvm/test/Transforms/InstCombine/ptrauth-intrinsics.ll
index 208e162ac9416..09d9649b09cc1 100644
--- a/llvm/test/Transforms/InstCombine/ptrauth-intrinsics.ll
+++ b/llvm/test/Transforms/InstCombine/ptrauth-intrinsics.ll
@@ -160,6 +160,34 @@ define i64 @test_ptrauth_resign_ptrauth_constant(ptr %p) {
   ret i64 %authed
 }
 
+@ds = external global i8
+
+define i64 @test_ptrauth_nop_ds1(ptr %p) {
+; CHECK-LABEL: @test_ptrauth_nop_ds1(
+; CHECK-NEXT:[[TMP0:%.*]] = ptrtoint ptr [[P:%.*]] to i64
+; CHECK-NEXT:[[SIGNED:%.*]] = call i64 @llvm.ptrauth.sign(i64 [[TMP0]], 
i32 1, i64 1234) [ "deactivation-symbol"(ptr @ds) ]
+; CHECK-NEXT:[[AUTHED:%.*]] = call i64 @llvm.ptrauth.auth(i64 [[SIGNED]], 
i32 1, i64 1234)
+; CHECK-NEXT:ret i64 [[AUTHED]]
+;
+  %tmp0 = ptrtoint ptr %p to i64
+  %signed = call i64 @llvm.ptrauth.sign(i64 %tmp0, i32 1, i64 1234) [ 
"deactivation-symbol"(ptr @ds) ]
+  %authed = call i64 @llvm.ptrauth.auth(i64 %signed, i32 1, i64 1234)
+  ret i64 %authed
+}
+
+define i64 @test_ptrauth_nop_ds2(ptr %p) {
+; CHECK-LABEL: @test_ptrauth_nop_ds2(
+; CHECK-NEXT:[[TMP0:%.*]] = ptrtoint ptr [[P:%.*]] to i64
+; CHECK-NEXT:[[SIGNED:%.*]] = call i64 @llvm.ptrauth.sign(i64 [[TMP0]], 
i32 1, i64 1234)
+; CHECK-NEXT:[[AUTHED:%.*]] = call i64 @llvm.ptrauth.auth(i64 [[SIGNED]], 
i32 1, i64 1234) [ "deactivation-symbol"(ptr @ds) ]
+; CHECK-NEXT:ret i64 [[AUTHED]]
+;
+  %tmp0 = ptrtoint ptr %p to i64
+  %signed = call i64 @llvm.ptrauth.sign(i64 %tmp0, i32 1, i64 1234)
+  %authed = call i64 @llvm.ptrauth.auth(i64 %signed, i32 1, i64 1234) [ 
"deactivation-symbol"(ptr @ds) ]
+  ret i64 %authed
+}
+
 declare i64 @llvm.ptrauth.auth(i64, i32, i64)
 declare i64 @llvm.ptrauth.sign(i64, i32, i64)
 declare i64 @llvm.ptrauth.resign(i64, i32, i64, i32, i64)

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Add deactivation symbol operand to ConstantPtrAuth. (PR #133537)

2025-09-12 Thread Peter Collingbourne via llvm-branch-commits

https://github.com/pcc updated https://github.com/llvm/llvm-project/pull/133537

>From e728f3444624a5f47f0af84c21fb3a584f3e05b7 Mon Sep 17 00:00:00 2001
From: Peter Collingbourne 
Date: Fri, 1 Aug 2025 17:27:41 -0700
Subject: [PATCH 1/5] Add verifier check

Created using spr 1.3.6-beta.1
---
 llvm/lib/IR/Verifier.cpp   | 5 +
 llvm/test/Verifier/ptrauth-constant.ll | 6 ++
 2 files changed, 11 insertions(+)
 create mode 100644 llvm/test/Verifier/ptrauth-constant.ll

diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index 3ff9895e161c4..3478c2c450ae7 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -2627,6 +2627,11 @@ void Verifier::visitConstantPtrAuth(const 
ConstantPtrAuth *CPA) {
 
   Check(CPA->getDiscriminator()->getBitWidth() == 64,
 "signed ptrauth constant discriminator must be i64 constant integer");
+
+  Check(isa(CPA->getDeactivationSymbol()) ||
+CPA->getDeactivationSymbol()->isNullValue(),
+"signed ptrauth constant deactivation symbol must be a global value "
+"or null");
 }
 
 bool Verifier::verifyAttributeCount(AttributeList Attrs, unsigned Params) {
diff --git a/llvm/test/Verifier/ptrauth-constant.ll 
b/llvm/test/Verifier/ptrauth-constant.ll
new file mode 100644
index 0..fdd6352cf8469
--- /dev/null
+++ b/llvm/test/Verifier/ptrauth-constant.ll
@@ -0,0 +1,6 @@
+; RUN: not opt -passes=verify < %s 2>&1 | FileCheck %s
+
+@g = external global i8
+
+; CHECK: signed ptrauth constant deactivation symbol must be a global variable 
or null
+@ptr = global ptr ptrauth (ptr @g, i32 0, i64 65535, ptr null, ptr inttoptr 
(i64 16 to ptr))

>From 60e836e71bf9aabe9dade2bda1ca38107f76b599 Mon Sep 17 00:00:00 2001
From: Peter Collingbourne 
Date: Mon, 8 Sep 2025 17:34:59 -0700
Subject: [PATCH 2/5] Address review comment

Created using spr 1.3.6-beta.1
---
 llvm/lib/IR/Constants.cpp | 1 +
 llvm/test/Assembler/invalid-ptrauth-const6.ll | 6 ++
 2 files changed, 7 insertions(+)
 create mode 100644 llvm/test/Assembler/invalid-ptrauth-const6.ll

diff --git a/llvm/lib/IR/Constants.cpp b/llvm/lib/IR/Constants.cpp
index 5eacc7af1269b..53b292f90c03d 100644
--- a/llvm/lib/IR/Constants.cpp
+++ b/llvm/lib/IR/Constants.cpp
@@ -2082,6 +2082,7 @@ ConstantPtrAuth::ConstantPtrAuth(Constant *Ptr, 
ConstantInt *Key,
   assert(Key->getBitWidth() == 32);
   assert(Disc->getBitWidth() == 64);
   assert(AddrDisc->getType()->isPointerTy());
+  assert(DeactivationSymbol->getType()->isPointerTy());
   setOperand(0, Ptr);
   setOperand(1, Key);
   setOperand(2, Disc);
diff --git a/llvm/test/Assembler/invalid-ptrauth-const6.ll 
b/llvm/test/Assembler/invalid-ptrauth-const6.ll
new file mode 100644
index 0..6e8e1d386acc8
--- /dev/null
+++ b/llvm/test/Assembler/invalid-ptrauth-const6.ll
@@ -0,0 +1,6 @@
+; RUN: not llvm-as < %s 2>&1 | FileCheck %s
+
+@var = global i32 0
+
+; CHECK: error: constant ptrauth deactivation symbol must be a pointer
+@ptr = global ptr ptrauth (ptr @var, i32 0, i64 65535, ptr null, i64 0)

>From a780d181fa69236d5909759a24a1134b50313980 Mon Sep 17 00:00:00 2001
From: Peter Collingbourne 
Date: Tue, 9 Sep 2025 17:18:49 -0700
Subject: [PATCH 3/5] Address review comment

Created using spr 1.3.6-beta.1
---
 llvm/lib/Bitcode/Reader/BitcodeReader.cpp | 3 +++
 llvm/lib/IR/Verifier.cpp  | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp 
b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
index 045ed204620fb..04fe4c57af6ed 100644
--- a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
+++ b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
@@ -1613,6 +1613,9 @@ Expected 
BitcodeReader::materializeValue(unsigned StartValID,
   ConstOps.size() > 4 ? ConstOps[4]
   : ConstantPointerNull::get(cast(
 ConstOps[3]->getType()));
+  if (DeactivationSymbol->getType()->isPointerTy())
+return error(
+"ptrauth deactivation symbol operand must be a pointer");
 
   C = ConstantPtrAuth::get(ConstOps[0], Key, Disc, ConstOps[3],
DeactivationSymbol);
diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index 9e44dfb387615..a53ba17e26011 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -2632,6 +2632,9 @@ void Verifier::visitConstantPtrAuth(const ConstantPtrAuth 
*CPA) {
   Check(CPA->getDiscriminator()->getBitWidth() == 64,
 "signed ptrauth constant discriminator must be i64 constant integer");
 
+  Check(CPA->getDeactivationSymbol()->getType()->isPointerTy(),
+"signed ptrauth constant deactivation symbol must be a pointer");
+
   Check(isa(CPA->getDeactivationSymbol()) ||
 CPA->getDeactivationSymbol()->isNullValue(),
 "signed ptrauth constant deactivation symbol must be a global value "

>From 51c353bbde24f940e3dfd7488aec0682dbef260b Mon Se

[llvm-branch-commits] [llvm] Add deactivation symbol operand to ConstantPtrAuth. (PR #133537)

2025-09-12 Thread Peter Collingbourne via llvm-branch-commits

https://github.com/pcc updated https://github.com/llvm/llvm-project/pull/133537

>From e728f3444624a5f47f0af84c21fb3a584f3e05b7 Mon Sep 17 00:00:00 2001
From: Peter Collingbourne 
Date: Fri, 1 Aug 2025 17:27:41 -0700
Subject: [PATCH 1/5] Add verifier check

Created using spr 1.3.6-beta.1
---
 llvm/lib/IR/Verifier.cpp   | 5 +
 llvm/test/Verifier/ptrauth-constant.ll | 6 ++
 2 files changed, 11 insertions(+)
 create mode 100644 llvm/test/Verifier/ptrauth-constant.ll

diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index 3ff9895e161c4..3478c2c450ae7 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -2627,6 +2627,11 @@ void Verifier::visitConstantPtrAuth(const 
ConstantPtrAuth *CPA) {
 
   Check(CPA->getDiscriminator()->getBitWidth() == 64,
 "signed ptrauth constant discriminator must be i64 constant integer");
+
+  Check(isa(CPA->getDeactivationSymbol()) ||
+CPA->getDeactivationSymbol()->isNullValue(),
+"signed ptrauth constant deactivation symbol must be a global value "
+"or null");
 }
 
 bool Verifier::verifyAttributeCount(AttributeList Attrs, unsigned Params) {
diff --git a/llvm/test/Verifier/ptrauth-constant.ll 
b/llvm/test/Verifier/ptrauth-constant.ll
new file mode 100644
index 0..fdd6352cf8469
--- /dev/null
+++ b/llvm/test/Verifier/ptrauth-constant.ll
@@ -0,0 +1,6 @@
+; RUN: not opt -passes=verify < %s 2>&1 | FileCheck %s
+
+@g = external global i8
+
+; CHECK: signed ptrauth constant deactivation symbol must be a global variable 
or null
+@ptr = global ptr ptrauth (ptr @g, i32 0, i64 65535, ptr null, ptr inttoptr 
(i64 16 to ptr))

>From 60e836e71bf9aabe9dade2bda1ca38107f76b599 Mon Sep 17 00:00:00 2001
From: Peter Collingbourne 
Date: Mon, 8 Sep 2025 17:34:59 -0700
Subject: [PATCH 2/5] Address review comment

Created using spr 1.3.6-beta.1
---
 llvm/lib/IR/Constants.cpp | 1 +
 llvm/test/Assembler/invalid-ptrauth-const6.ll | 6 ++
 2 files changed, 7 insertions(+)
 create mode 100644 llvm/test/Assembler/invalid-ptrauth-const6.ll

diff --git a/llvm/lib/IR/Constants.cpp b/llvm/lib/IR/Constants.cpp
index 5eacc7af1269b..53b292f90c03d 100644
--- a/llvm/lib/IR/Constants.cpp
+++ b/llvm/lib/IR/Constants.cpp
@@ -2082,6 +2082,7 @@ ConstantPtrAuth::ConstantPtrAuth(Constant *Ptr, 
ConstantInt *Key,
   assert(Key->getBitWidth() == 32);
   assert(Disc->getBitWidth() == 64);
   assert(AddrDisc->getType()->isPointerTy());
+  assert(DeactivationSymbol->getType()->isPointerTy());
   setOperand(0, Ptr);
   setOperand(1, Key);
   setOperand(2, Disc);
diff --git a/llvm/test/Assembler/invalid-ptrauth-const6.ll 
b/llvm/test/Assembler/invalid-ptrauth-const6.ll
new file mode 100644
index 0..6e8e1d386acc8
--- /dev/null
+++ b/llvm/test/Assembler/invalid-ptrauth-const6.ll
@@ -0,0 +1,6 @@
+; RUN: not llvm-as < %s 2>&1 | FileCheck %s
+
+@var = global i32 0
+
+; CHECK: error: constant ptrauth deactivation symbol must be a pointer
+@ptr = global ptr ptrauth (ptr @var, i32 0, i64 65535, ptr null, i64 0)

>From a780d181fa69236d5909759a24a1134b50313980 Mon Sep 17 00:00:00 2001
From: Peter Collingbourne 
Date: Tue, 9 Sep 2025 17:18:49 -0700
Subject: [PATCH 3/5] Address review comment

Created using spr 1.3.6-beta.1
---
 llvm/lib/Bitcode/Reader/BitcodeReader.cpp | 3 +++
 llvm/lib/IR/Verifier.cpp  | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp 
b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
index 045ed204620fb..04fe4c57af6ed 100644
--- a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
+++ b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
@@ -1613,6 +1613,9 @@ Expected 
BitcodeReader::materializeValue(unsigned StartValID,
   ConstOps.size() > 4 ? ConstOps[4]
   : ConstantPointerNull::get(cast(
 ConstOps[3]->getType()));
+  if (DeactivationSymbol->getType()->isPointerTy())
+return error(
+"ptrauth deactivation symbol operand must be a pointer");
 
   C = ConstantPtrAuth::get(ConstOps[0], Key, Disc, ConstOps[3],
DeactivationSymbol);
diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index 9e44dfb387615..a53ba17e26011 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -2632,6 +2632,9 @@ void Verifier::visitConstantPtrAuth(const ConstantPtrAuth 
*CPA) {
   Check(CPA->getDiscriminator()->getBitWidth() == 64,
 "signed ptrauth constant discriminator must be i64 constant integer");
 
+  Check(CPA->getDeactivationSymbol()->getType()->isPointerTy(),
+"signed ptrauth constant deactivation symbol must be a pointer");
+
   Check(isa(CPA->getDeactivationSymbol()) ||
 CPA->getDeactivationSymbol()->isNullValue(),
 "signed ptrauth constant deactivation symbol must be a global value "

>From 51c353bbde24f940e3dfd7488aec0682dbef260b Mon Se

[llvm-branch-commits] [llvm] [libc++] Test triggering a benchmarking job comment (PR #158138)

2025-09-12 Thread Louis Dionne via llvm-branch-commits

ldionne wrote:

/libcxx-bot benchmark libcxx/test/benchmarks/join_view.bench.cpp 
libcxx/test/benchmarks/hash.bench.cpp

https://github.com/llvm/llvm-project/pull/158138
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [libc++] Test triggering a benchmarking job comment (PR #158138)

2025-09-12 Thread Louis Dionne via llvm-branch-commits

https://github.com/ldionne closed 
https://github.com/llvm/llvm-project/pull/158138
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Add deactivation symbol operand to ConstantPtrAuth. (PR #133537)

2025-09-12 Thread Oliver Hunt via llvm-branch-commits

ojhunt wrote:

> This isn't possible, the symbols are resolved at static link time. See the 
> RFC for more information: 
> https://discourse.llvm.org/t/rfc-deactivation-symbols/85556

Oh wait, I have completely misunderstood that - I have always assumed dynamic 
link and that's the reason for a bunch of the concerns I raised, that I now 
assume sounded really weird :D

https://github.com/llvm/llvm-project/pull/133537
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopPeel] Fix branch weights' effect on block frequencies (PR #128785)

2025-09-12 Thread Joel E. Denny via llvm-branch-commits

https://github.com/jdenny-ornl edited 
https://github.com/llvm/llvm-project/pull/128785
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Add deactivation symbol operand to ConstantPtrAuth. (PR #133537)

2025-09-12 Thread Oliver Hunt via llvm-branch-commits


@@ -2135,6 +2135,11 @@ bool 
ConstantPtrAuth::hasSpecialAddressDiscriminator(uint64_t Value) const {
 bool ConstantPtrAuth::isKnownCompatibleWith(const Value *Key,
 const Value *Discriminator,
 const DataLayout &DL) const {
+  // This function may only be validly called to analyze a ptrauth operation 
with
+  // no deactivation symbol, so if we have one it isn't compatible.
+  if (!getDeactivationSymbol()->isNullValue())

ojhunt wrote:

Sigh, IR vs clang again - I was thinking about this in the context of qualified 
type compatibility. Sigh.

https://github.com/llvm/llvm-project/pull/133537
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] Add pointer field protection feature. (PR #133538)

2025-09-12 Thread Oliver Hunt via llvm-branch-commits

ojhunt wrote:

> @pcc and I have been discussing this.
> 
> * The perf issues I was concerned about were predicated on access to a 
> pointer loaded from a field continuing to be checked after the original field 
> load, this is not the case (and in hindsight doing so would imply passing the 
> pointer as a parameter to a function would maintain the tag and require the 
> target knowing about it).

For people following along, despite multiple different places saying the symbol 
resolution is static, I'm a muppet and thought this was a dynamic link check, 
hence had all sorts of problems.

However it's a static link time check, so I'm a muppet and many of my concerns 
are irrelevant.

https://github.com/llvm/llvm-project/pull/133538
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [Clang] Port ulimit tests to work with internal shell (PR #157977)

2025-09-12 Thread Paul Kirth via llvm-branch-commits

https://github.com/ilovepi approved this pull request.


https://github.com/llvm/llvm-project/pull/157977
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [lit] Implement ulimit builtin (PR #157958)

2025-09-12 Thread Paul Kirth via llvm-branch-commits

https://github.com/ilovepi approved this pull request.


https://github.com/llvm/llvm-project/pull/157958
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


  1   2   3   >