[llvm-branch-commits] [clang] [Clang] Introduce -fsanitize=alloc-token (PR #156839)
@@ -73,8 +74,9 @@ class SanitizerArgs { bool HwasanUseAliases = false; llvm::AsanDetectStackUseAfterReturnMode AsanUseAfterReturn = llvm::AsanDetectStackUseAfterReturnMode::Invalid; - fmayer wrote: stray change https://github.com/llvm/llvm-project/pull/156839 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] Add pointer field protection feature. (PR #133538)
@@ -2201,6 +2215,22 @@ void CodeGenFunction::EmitCXXConstructorCall( EmitTypeCheck(CodeGenFunction::TCK_ConstructorCall, Loc, This, getContext().getRecordType(ClassDecl), CharUnits::Zero()); + // When initializing an object that has pointer field protection and whose + // fields are not trivially relocatable we must initialize any pointer fields + // to a valid signed pointer (any pointer value will do, but we just use null + // pointers). This is because if the object is subsequently copied, its copy + // constructor will need to read and authenticate any pointer fields in order + // to copy the object to a new address, which will fail if the pointers are + // uninitialized. + if (!getContext().arePFPFieldsTriviallyRelocatable(D->getParent())) { pcc wrote: Looking more closely through the standard confirms that we don't need to do this initialization in the compiler. Because the uninitialized fields may be considered to be what the standard calls "invalid pointer values", the standard gives us a lot of leeway for implementation-defined behavior when reading them. The standard specifically calls out what we want to happen here: > Some implementations might define that copying an invalid pointer value > causes a system-generated runtime fault. In practice there seem to be only a few places that need to be fixed, so we can just fix them. https://github.com/llvm/llvm-project/pull/133538 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopUnroll] Fix block frequencies when no runtime (PR #157754)
https://github.com/jdenny-ornl created https://github.com/llvm/llvm-project/pull/157754 This patch implements the LoopUnroll changes discussed in [[RFC] Fix Loop Transformations to Preserve Block Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785) and is thus another step in addressing issue #135812. In summary, for the case of partial loop unrolling without a runtime, this patch changes LoopUnroll to: - Maintain branch weights consistently with the original loop for the sake of preserving the total frequency of the original loop body. - Store the new estimated trip count in the `llvm.loop.estimated_trip_count` metadata, introduced by PR #148758. - Correct the new estimated trip count (e.g., 3 instead of 2) when the original estimated trip count (e.g., 10) divided by the unroll count (e.g., 4) leaves a remainder (e.g., 2). There are loop unrolling cases this patch does not fully fix, such as partial unrolling with a runtime and complete unrolling, and there are two associated tests this patch marks as XFAIL. They will be addressed in future patches that should land with this patch. >From 75a8df62df2ef7e8c02d7a76120e57e2dd1a1539 Mon Sep 17 00:00:00 2001 From: "Joel E. Denny" Date: Tue, 9 Sep 2025 17:33:38 -0400 Subject: [PATCH] [LoopUnroll] Fix block frequencies when no runtime This patch implements the LoopUnroll changes discussed in [[RFC] Fix Loop Transformations to Preserve Block Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785) and is thus another step in addressing issue #135812. In summary, for the case of partial loop unrolling without a runtime, this patch changes LoopUnroll to: - Maintain branch weights consistently with the original loop for the sake of preserving the total frequency of the original loop body. - Store the new estimated trip count in the `llvm.loop.estimated_trip_count` metadata, introduced by PR #148758. - Correct the new estimated trip count (e.g., 3 instead of 2) when the original estimated trip count (e.g., 10) divided by the unroll count (e.g., 4) leaves a remainder (e.g., 2). There are loop unrolling cases this patch does not fully fix, such as partial unrolling with a runtime and complete unrolling, and there are two associated tests this patch marks as XFAIL. They will be addressed in future patches that should land with this patch. --- llvm/lib/Transforms/Utils/LoopUnroll.cpp | 36 -- .../peel.ll} | 0 .../branch-weights-freq/unroll-partial.ll | 68 +++ .../LoopUnroll/runtime-loop-branchweight.ll | 1 + .../LoopUnroll/unroll-heuristics-pgo.ll | 1 + 5 files changed, 100 insertions(+), 6 deletions(-) rename llvm/test/Transforms/LoopUnroll/{peel-branch-weights-freq.ll => branch-weights-freq/peel.ll} (100%) create mode 100644 llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial.ll diff --git a/llvm/lib/Transforms/Utils/LoopUnroll.cpp b/llvm/lib/Transforms/Utils/LoopUnroll.cpp index 8a6c7789d1372..93c43396c54b6 100644 --- a/llvm/lib/Transforms/Utils/LoopUnroll.cpp +++ b/llvm/lib/Transforms/Utils/LoopUnroll.cpp @@ -499,9 +499,8 @@ llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, LoopInfo *LI, const unsigned MaxTripCount = SE->getSmallConstantMaxTripCount(L); const bool MaxOrZero = SE->isBackedgeTakenCountMaxOrZero(L); - unsigned EstimatedLoopInvocationWeight = 0; std::optional OriginalTripCount = - llvm::getLoopEstimatedTripCount(L, &EstimatedLoopInvocationWeight); + llvm::getLoopEstimatedTripCount(L); // Effectively "DCE" unrolled iterations that are beyond the max tripcount // and will never be executed. @@ -1130,10 +1129,35 @@ llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, LoopInfo *LI, // We shouldn't try to use `L` anymore. L = nullptr; } else if (OriginalTripCount) { -// Update the trip count. Note that the remainder has already logic -// computing it in `UnrollRuntimeLoopRemainder`. -setLoopEstimatedTripCount(L, *OriginalTripCount / ULO.Count, - EstimatedLoopInvocationWeight); +// Update metadata for the estimated trip count. +// +// If ULO.Runtime, UnrollRuntimeLoopRemainder handles branch weights for the +// remainder loop it creates, and the unrolled loop's branch weights are +// adjusted below. Otherwise, if unrolled loop iterations' latches become +// unconditional, branch weights are adjusted above. Otherwise, the +// original loop's branch weights are correct for the unrolled loop, so do +// not adjust them. +// FIXME: Actually handle such unconditional latches and ULO.Runtime. +// +// For example, consider what happens if the unroll count is 4 for a loop +// with an estimated trip count of 10 when we do not create a remainder loop +// and all iterations' latches remain conditional.
[llvm-branch-commits] [AllocToken, Clang] Infer type hints from sizeof expressions and casts (PR #156841)
@@ -1349,6 +1350,98 @@ void CodeGenFunction::EmitAllocTokenHint(llvm::CallBase *CB, CB->setMetadata(llvm::LLVMContext::MD_alloc_token_hint, MDN); } +/// Infer type from a simple sizeof expression. +static QualType inferTypeFromSizeofExpr(const Expr *E) { + const Expr *Arg = E->IgnoreParenImpCasts(); + if (const auto *UET = dyn_cast(Arg)) { +if (UET->getKind() == UETT_SizeOf) { + if (UET->isArgumentType()) { +return UET->getArgumentTypeInfo()->getType(); + } else { +return UET->getArgumentExpr()->getType(); + } +} + } + return QualType(); +} + +/// Infer type from an arithmetic expression involving a sizeof. +static QualType inferTypeFromArithSizeofExpr(const Expr *E) { + const Expr *Arg = E->IgnoreParenImpCasts(); + // The argument is a lone sizeof expression. + QualType QT = inferTypeFromSizeofExpr(Arg); fmayer wrote: ``` if (QualType QT = inferTypeFromSizeofExpr(Arg); !QT.isNull()) return QT; ``` and below https://github.com/llvm/llvm-project/pull/156841 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lit] Support -c flag for diff (PR #157584)
https://github.com/boomanaiden154 closed https://github.com/llvm/llvm-project/pull/157584 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [Clang] Introduce -fsanitize=alloc-token (PR #156839)
@@ -2367,6 +2371,16 @@ bool CompilerInvocation::ParseCodeGenArgs(CodeGenOptions &Opts, ArgList &Args, } } + if (const auto *Arg = Args.getLastArg(options::OPT_falloc_token_max_EQ)) { +StringRef S = Arg->getValue(); +uint64_t Value = 0; +if (S.getAsInteger(0, Value)) { fmayer wrote: remove braces https://github.com/llvm/llvm-project/pull/156839 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [MC] Rewrite stdin.s to use python (PR #157232)
https://github.com/boomanaiden154 updated https://github.com/llvm/llvm-project/pull/157232 >From d749f30964e57caa797b3df87ae88ffc3d4a2f54 Mon Sep 17 00:00:00 2001 From: Aiden Grossman Date: Sun, 7 Sep 2025 17:39:19 + Subject: [PATCH 1/3] feedback Created using spr 1.3.6 --- llvm/test/MC/COFF/stdin.py | 17 + llvm/test/MC/COFF/stdin.s | 1 - 2 files changed, 17 insertions(+), 1 deletion(-) create mode 100644 llvm/test/MC/COFF/stdin.py delete mode 100644 llvm/test/MC/COFF/stdin.s diff --git a/llvm/test/MC/COFF/stdin.py b/llvm/test/MC/COFF/stdin.py new file mode 100644 index 0..8b7b6ae1fba13 --- /dev/null +++ b/llvm/test/MC/COFF/stdin.py @@ -0,0 +1,17 @@ +# RUN: echo "// comment" > %t.input +# RUN: which llvm-mc | %python %s %t + +import subprocess +import sys + +llvm_mc_binary = sys.stdin.readlines()[0].strip() +temp_file = sys.argv[1] +input_file = temp_file + ".input" + +with open(temp_file, "w") as mc_stdout: +mc_stdout.seek(4) +subprocess.run( +[llvm_mc_binary, "-filetype=obj", "-triple", "i686-pc-win32", input_file], +stdout=mc_stdout, +check=True, +) diff --git a/llvm/test/MC/COFF/stdin.s b/llvm/test/MC/COFF/stdin.s deleted file mode 100644 index 8ceae7fdef501..0 --- a/llvm/test/MC/COFF/stdin.s +++ /dev/null @@ -1 +0,0 @@ -// RUN: bash -c '(echo "test"; llvm-mc -filetype=obj -triple i686-pc-win32 %s ) > %t' >From 0bfe954d4cd5edf4312e924c278c59e57644d5f1 Mon Sep 17 00:00:00 2001 From: Aiden Grossman Date: Mon, 8 Sep 2025 17:28:59 + Subject: [PATCH 2/3] feedback Created using spr 1.3.6 --- llvm/test/MC/COFF/stdin.py | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/llvm/test/MC/COFF/stdin.py b/llvm/test/MC/COFF/stdin.py index 8b7b6ae1fba13..1d9b50c022523 100644 --- a/llvm/test/MC/COFF/stdin.py +++ b/llvm/test/MC/COFF/stdin.py @@ -1,14 +1,22 @@ # RUN: echo "// comment" > %t.input # RUN: which llvm-mc | %python %s %t +import argparse import subprocess import sys +parser = argparse.ArgumentParser() +parser.add_argument("temp_file") +arguments = parser.parse_args() + llvm_mc_binary = sys.stdin.readlines()[0].strip() -temp_file = sys.argv[1] +temp_file = arguments.temp_file input_file = temp_file + ".input" with open(temp_file, "w") as mc_stdout: +## We need to test that starting on an input stream with a non-zero offset +## does not trigger an assertion in WinCOFFObjectWriter.cpp, so we seek +## past zero for STDOUT. mc_stdout.seek(4) subprocess.run( [llvm_mc_binary, "-filetype=obj", "-triple", "i686-pc-win32", input_file], >From 2ae17e4f18a95c52b53ad5ad45a19c4bf29e5025 Mon Sep 17 00:00:00 2001 From: Aiden Grossman Date: Mon, 8 Sep 2025 17:43:39 + Subject: [PATCH 3/3] feedback Created using spr 1.3.6 --- llvm/test/MC/COFF/stdin.py | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/llvm/test/MC/COFF/stdin.py b/llvm/test/MC/COFF/stdin.py index 1d9b50c022523..0da1b4895142b 100644 --- a/llvm/test/MC/COFF/stdin.py +++ b/llvm/test/MC/COFF/stdin.py @@ -1,25 +1,30 @@ # RUN: echo "// comment" > %t.input -# RUN: which llvm-mc | %python %s %t +# RUN: which llvm-mc | %python %s %t.input %t import argparse import subprocess import sys parser = argparse.ArgumentParser() +parser.add_argument("input_file") parser.add_argument("temp_file") arguments = parser.parse_args() llvm_mc_binary = sys.stdin.readlines()[0].strip() -temp_file = arguments.temp_file -input_file = temp_file + ".input" -with open(temp_file, "w") as mc_stdout: +with open(arguments.temp_file, "w") as mc_stdout: ## We need to test that starting on an input stream with a non-zero offset ## does not trigger an assertion in WinCOFFObjectWriter.cpp, so we seek ## past zero for STDOUT. mc_stdout.seek(4) subprocess.run( -[llvm_mc_binary, "-filetype=obj", "-triple", "i686-pc-win32", input_file], +[ +llvm_mc_binary, +"-filetype=obj", +"-triple", +"i686-pc-win32", +arguments.input_file, +], stdout=mc_stdout, check=True, ) ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [AllocToken, Clang] Infer type hints from sizeof expressions and casts (PR #156841)
https://github.com/melver updated https://github.com/llvm/llvm-project/pull/156841 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Use lit internal shell by default (PR #157237)
https://github.com/boomanaiden154 updated https://github.com/llvm/llvm-project/pull/157237 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] d47a574 - Revert "[HLSL] Rewrite semantics parsing (#152537)"
Author: Nathan Gauër Date: 2025-09-09T19:11:28+02:00 New Revision: d47a574d9ab76ae599a1d9dadbbaf9709ab35758 URL: https://github.com/llvm/llvm-project/commit/d47a574d9ab76ae599a1d9dadbbaf9709ab35758 DIFF: https://github.com/llvm/llvm-project/commit/d47a574d9ab76ae599a1d9dadbbaf9709ab35758.diff LOG: Revert "[HLSL] Rewrite semantics parsing (#152537)" This reverts commit 57e1846c96f0c858f687fe9c66f4e3793b52f497. Added: Modified: clang/include/clang/AST/Attr.h clang/include/clang/Basic/Attr.td clang/include/clang/Basic/DiagnosticFrontendKinds.td clang/include/clang/Basic/DiagnosticParseKinds.td clang/include/clang/Basic/DiagnosticSemaKinds.td clang/include/clang/Parse/Parser.h clang/include/clang/Sema/SemaHLSL.h clang/lib/Basic/Attributes.cpp clang/lib/CodeGen/CGHLSLRuntime.cpp clang/lib/CodeGen/CGHLSLRuntime.h clang/lib/Parse/ParseHLSL.cpp clang/lib/Sema/SemaDeclAttr.cpp clang/lib/Sema/SemaHLSL.cpp clang/test/CodeGenHLSL/semantics/SV_Position.ps.hlsl clang/test/ParserHLSL/semantic_parsing.hlsl clang/test/SemaHLSL/Semantics/invalid_entry_parameter.hlsl clang/utils/TableGen/ClangAttrEmitter.cpp Removed: clang/test/CodeGenHLSL/semantics/DispatchThreadID-noindex.hlsl clang/test/CodeGenHLSL/semantics/SV_GroupID-noindex.hlsl clang/test/CodeGenHLSL/semantics/SV_GroupThreadID-noindex.hlsl clang/test/CodeGenHLSL/semantics/missing.hlsl clang/test/ParserHLSL/semantic_parsing_define.hlsl diff --git a/clang/include/clang/AST/Attr.h b/clang/include/clang/AST/Attr.h index fe388b9fa045e..994f236337b99 100644 --- a/clang/include/clang/AST/Attr.h +++ b/clang/include/clang/AST/Attr.h @@ -232,40 +232,6 @@ class HLSLAnnotationAttr : public InheritableAttr { } }; -class HLSLSemanticAttr : public HLSLAnnotationAttr { - unsigned SemanticIndex = 0; - LLVM_PREFERRED_TYPE(bool) - unsigned SemanticIndexable : 1; - LLVM_PREFERRED_TYPE(bool) - unsigned SemanticExplicitIndex : 1; - -protected: - HLSLSemanticAttr(ASTContext &Context, const AttributeCommonInfo &CommonInfo, - attr::Kind AK, bool IsLateParsed, - bool InheritEvenIfAlreadyPresent, bool SemanticIndexable) - : HLSLAnnotationAttr(Context, CommonInfo, AK, IsLateParsed, - InheritEvenIfAlreadyPresent) { -this->SemanticIndexable = SemanticIndexable; -this->SemanticExplicitIndex = false; - } - -public: - bool isSemanticIndexable() const { return SemanticIndexable; } - - void setSemanticIndex(unsigned SemanticIndex) { -this->SemanticIndex = SemanticIndex; -this->SemanticExplicitIndex = true; - } - - unsigned getSemanticIndex() const { return SemanticIndex; } - - // Implement isa/cast/dyncast/etc. - static bool classof(const Attr *A) { -return A->getKind() >= attr::FirstHLSLSemanticAttr && - A->getKind() <= attr::LastHLSLSemanticAttr; - } -}; - /// A parameter attribute which changes the argument-passing ABI rule /// for the parameter. class ParameterABIAttr : public InheritableParamAttr { diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td index b85abfcbecfcf..10bf96a50c982 100644 --- a/clang/include/clang/Basic/Attr.td +++ b/clang/include/clang/Basic/Attr.td @@ -779,16 +779,6 @@ class DeclOrStmtAttr : InheritableAttr; /// An attribute class for HLSL Annotations. class HLSLAnnotationAttr : InheritableAttr; -class HLSLSemanticAttr : HLSLAnnotationAttr { - bit SemanticIndexable = Indexable; - int SemanticIndex = 0; - bit SemanticExplicitIndex = 0; - - let Spellings = []; - let Subjects = SubjectList<[ParmVar, Field, Function]>; - let LangOpts = [HLSL]; -} - /// A target-specific attribute. This class is meant to be used as a mixin /// with InheritableAttr or Attr depending on the attribute's needs. class TargetSpecificAttr { @@ -4900,6 +4890,27 @@ def HLSLNumThreads: InheritableAttr { let Documentation = [NumThreadsDocs]; } +def HLSLSV_GroupThreadID: HLSLAnnotationAttr { + let Spellings = [HLSLAnnotation<"sv_groupthreadid">]; + let Subjects = SubjectList<[ParmVar, Field]>; + let LangOpts = [HLSL]; + let Documentation = [HLSLSV_GroupThreadIDDocs]; +} + +def HLSLSV_GroupID: HLSLAnnotationAttr { + let Spellings = [HLSLAnnotation<"sv_groupid">]; + let Subjects = SubjectList<[ParmVar, Field]>; + let LangOpts = [HLSL]; + let Documentation = [HLSLSV_GroupIDDocs]; +} + +def HLSLSV_GroupIndex: HLSLAnnotationAttr { + let Spellings = [HLSLAnnotation<"sv_groupindex">]; + let Subjects = SubjectList<[ParmVar, GlobalVar]>; + let LangOpts = [HLSL]; + let Documentation = [HLSLSV_GroupIndexDocs]; +} + def HLSLVkBinding : InheritableAttr { let Spellings = [CXX11<"vk", "binding">]; let Subjects = SubjectList<[HLSLBufferObj, ExternalGlobalVar], ErrorDiag>; @@ -4958,35 +4969,13 @@ def HLSLResourceBinding: InheritableAttr { }]; } -
[llvm-branch-commits] [llvm] [AMDGPU][gfx1250] Remove SCOPE_SE for scratch stores (PR #157640)
https://github.com/rampitec approved this pull request. https://github.com/llvm/llvm-project/pull/157640 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopPeel] Fix branch weights' effect on block frequencies (PR #128785)
https://github.com/jdenny-ornl updated https://github.com/llvm/llvm-project/pull/128785 >From f4135207e955f6c2e358cad54a7ef6f2f18087f8 Mon Sep 17 00:00:00 2001 From: "Joel E. Denny" Date: Wed, 19 Mar 2025 16:19:40 -0400 Subject: [PATCH 1/9] [LoopPeel] Fix branch weights' effect on block frequencies For example: ``` declare void @f(i32) define void @test(i32 %n) { entry: br label %do.body do.body: %i = phi i32 [ 0, %entry ], [ %inc, %do.body ] %inc = add i32 %i, 1 call void @f(i32 %i) %c = icmp sge i32 %inc, %n br i1 %c, label %do.end, label %do.body, !prof !0 do.end: ret void } !0 = !{!"branch_weights", i32 1, i32 9} ``` Given those branch weights, once any loop iteration is actually reached, the probability of the loop exiting at the iteration's end is 1/(1+9). That is, the loop is likely to exit every 10 iterations and thus has an estimated trip count of 10. `opt -passes='print'` shows that 10 is indeed the frequency of the loop body: ``` Printing analysis results of BFI for function 'test': block-frequency-info: test - entry: float = 1.0, int = 1801439852625920 - do.body: float = 10.0, int = 18014398509481984 - do.end: float = 1.0, int = 1801439852625920 ``` Key Observation: The frequency of reaching any particular iteration is less than for the previous iteration because the previous iteration has a non-zero probability of exiting the loop. This observation holds even though every loop iteration, once actually reached, has exactly the same probability of exiting and thus exactly the same branch weights. Now we use `opt -unroll-force-peel-count=2 -passes=loop-unroll` to peel 2 iterations and insert them before the remaining loop. We expect the key observation above not to change, but it does under the implementation without this patch. The block frequency becomes 1.0 for the first iteration, 0.9 for the second, and 6.4 for the main loop body. Again, a decreasing frequency is expected, but it decreases too much: the total frequency of the original loop body becomes 8.3. The new branch weights reveal the problem: ``` !0 = !{!"branch_weights", i32 1, i32 9} !1 = !{!"branch_weights", i32 1, i32 8} !2 = !{!"branch_weights", i32 1, i32 7} ``` The exit probability is now 1/10 for the first peeled iteration, 1/9 for the second, and 1/8 for the remaining loop iterations. It seems this behavior is trying to ensure a decreasing block frequency. However, as in the key observation above for the original loop, that happens correctly without decreasing the branch weights across iterations. This patch changes the peeling implementation not to decrease the branch weights across loop iterations so that the frequency for every iteration is the same as it was in the original loop. The total frequency of the loop body, summed across all its occurrences, thus remains 10 after peeling. Unfortunately, that change means a later analysis cannot accurately estimate the trip count of the remaining loop while examining the remaining loop in isolation without considering the probability of actually reaching it. For that purpose, this patch stores the new trip count as separate metadata named `llvm.loop.estimated_trip_count` and extends `llvm::getLoopEstimatedTripCount` to prefer it, if present, over branch weights. An alternative fix is for `llvm::getLoopEstimatedTripCount` to subtract the `llvm.loop.peeled.count` metadata from the trip count estimated by a loop's branch weights. However, there might be other loop transformations that still corrupt block frequencies in a similar manner and require a similar fix. `llvm.loop.estimated_trip_count` is intended to provide a general way to store estimated trip counts when branch weights cannot directly store them. This patch introduces several fixme comments that need to be addressed before it can land. --- .../include/llvm/Transforms/Utils/LoopUtils.h | 25 ++- llvm/lib/Transforms/Utils/LoopPeel.cpp| 145 +++--- llvm/lib/Transforms/Utils/LoopUtils.cpp | 20 ++- .../LoopUnroll/peel-branch-weights-freq.ll| 75 + .../LoopUnroll/peel-branch-weights.ll | 64 .../LoopUnroll/peel-loop-pgo-deopt.ll | 11 +- .../Transforms/LoopUnroll/peel-loop-pgo.ll| 13 +- .../Transforms/LoopVectorize/X86/pr81872.ll | 18 ++- 8 files changed, 217 insertions(+), 154 deletions(-) create mode 100644 llvm/test/Transforms/LoopUnroll/peel-branch-weights-freq.ll diff --git a/llvm/include/llvm/Transforms/Utils/LoopUtils.h b/llvm/include/llvm/Transforms/Utils/LoopUtils.h index 8f4c0c88336ac..82d23a4b68ea1 100644 --- a/llvm/include/llvm/Transforms/Utils/LoopUtils.h +++ b/llvm/include/llvm/Transforms/Utils/LoopUtils.h @@ -315,7 +315,8 @@ TransformationMode hasLICMVersioningTransformation(const Loop *L); void addStringMetadataToLoop(Loop *TheLoop, const char *MDString, unsigned V = 0); -/// Returns a loop's estimated trip count based on branch weight metadata. +
[llvm-branch-commits] [llvm] [AllocToken, Clang] Implement __builtin_alloc_token_infer() and llvm.alloc.token.id (PR #156842)
https://github.com/ojhunt edited https://github.com/llvm/llvm-project/pull/156842 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AllocToken, Clang] Implement __builtin_alloc_token_infer() and llvm.alloc.token.id (PR #156842)
@@ -846,6 +836,22 @@ static void addSanitizers(const Triple &TargetTriple, } } +static void addAllocTokenPass(const Triple &TargetTriple, ojhunt wrote: I'd rather separate sema changes from codegen https://github.com/llvm/llvm-project/pull/156842 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AllocToken, Clang] Implement __builtin_alloc_token_infer() and llvm.alloc.token.id (PR #156842)
@@ -5760,6 +5764,24 @@ bool Sema::BuiltinAllocaWithAlign(CallExpr *TheCall) { return false; } +bool Sema::BuiltinAllocTokenInfer(CallExpr *TheCall) { ojhunt wrote: I would prefer this not be a Sema member, and would prefer the static function with a `Sema&` parameter model instead? I find the `Sema::Builtin*(..)` naming model to be unnecessarily noisy and confusing, and as it looks like there's a mix of `Sema::` methods and `static` functions, I wonder if there's a strong preference among others? cc @Endilll, and @AaronBallman (who I believe is on vacation or similar so I would expect/hope not to get an immediate reply) https://github.com/llvm/llvm-project/pull/156842 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AllocToken, Clang] Implement __builtin_alloc_token_infer() and llvm.alloc.token.id (PR #156842)
@@ -3352,10 +3352,15 @@ class CodeGenFunction : public CodeGenTypeCache { SanitizerAnnotateDebugInfo(ArrayRef Ordinals, SanitizerHandler Handler); - /// Emit additional metadata used by the AllocToken instrumentation. + /// Emit metadata used by the AllocToken instrumentation. + llvm::MDNode *EmitAllocTokenHint(QualType AllocType); ojhunt wrote: Or Compute? or something other than Emit. You're not emitting the hint, you're simply constructing it to permit it to be usable in multiple places :D https://github.com/llvm/llvm-project/pull/156842 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [mlir] [flang][OpenMP] Support multi-block reduction combiner regions on the GPU (PR #156837)
https://github.com/ergawy updated https://github.com/llvm/llvm-project/pull/156837 >From adf9d42e554437a8e816e190a8ad64ae4770404c Mon Sep 17 00:00:00 2001 From: ergawy Date: Thu, 4 Sep 2025 01:06:21 -0500 Subject: [PATCH] [flang][OpenMP] Support multi-block reduction combiner regions on the GPU Fixes a bug related to insertion points when inlining multi-block combiner reduction regions. The IP at the end of the inlined region was not used resulting in emitting BBs with multiple terminators. --- llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | 3 + .../omptarget-multi-block-reduction.mlir | 85 +++ 2 files changed, 88 insertions(+) create mode 100644 mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp index 3d5e487c8990f..fe00a2a5696dc 100644 --- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp +++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp @@ -3506,6 +3506,8 @@ Expected OpenMPIRBuilder::createReductionFunction( return AfterIP.takeError(); if (!Builder.GetInsertBlock()) return ReductionFunc; + + Builder.SetInsertPoint(AfterIP->getBlock(), AfterIP->getPoint()); Builder.CreateStore(Reduced, LHSPtr); } } @@ -3750,6 +3752,7 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::createReductionsGPU( RI.ReductionGen(Builder.saveIP(), RHSValue, LHSValue, Reduced); if (!AfterIP) return AfterIP.takeError(); + Builder.SetInsertPoint(AfterIP->getBlock(), AfterIP->getPoint()); Builder.CreateStore(Reduced, LHS, false); } } diff --git a/mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir b/mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir new file mode 100644 index 0..aaf06d2d0e0c2 --- /dev/null +++ b/mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir @@ -0,0 +1,85 @@ +// RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s + +// Verifies that the IR builder can handle reductions with multi-block combiner +// regions on the GPU. + +module attributes {dlti.dl_spec = #dlti.dl_spec<"dlti.alloca_memory_space" = 5 : ui64, "dlti.global_memory_space" = 1 : ui64>, llvm.target_triple = "amdgcn-amd-amdhsa", omp.is_gpu = true, omp.is_target_device = true} { + llvm.func @bar() {} + llvm.func @baz() {} + + omp.declare_reduction @add_reduction_byref_box_5xf32 : !llvm.ptr alloc { +%0 = llvm.mlir.constant(1 : i64) : i64 +%1 = llvm.alloca %0 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8, array<1 x array<3 x i64>>)> : (i64) -> !llvm.ptr<5> +%2 = llvm.addrspacecast %1 : !llvm.ptr<5> to !llvm.ptr +omp.yield(%2 : !llvm.ptr) + } init { + ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr): +omp.yield(%arg1 : !llvm.ptr) + } combiner { + ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr): +llvm.call @bar() : () -> () +llvm.br ^bb3 + + ^bb3: // pred: ^bb1 +llvm.call @baz() : () -> () +omp.yield(%arg0 : !llvm.ptr) + } + llvm.func @foo_() { +%c1 = llvm.mlir.constant(1 : i64) : i64 +%10 = llvm.alloca %c1 x !llvm.array<5 x f32> {bindc_name = "x"} : (i64) -> !llvm.ptr<5> +%11 = llvm.addrspacecast %10 : !llvm.ptr<5> to !llvm.ptr +%74 = omp.map.info var_ptr(%11 : !llvm.ptr, !llvm.array<5 x f32>) map_clauses(tofrom) capture(ByRef) -> !llvm.ptr {name = "x"} +omp.target map_entries(%74 -> %arg0 : !llvm.ptr) { + %c1_2 = llvm.mlir.constant(1 : i32) : i32 + %c10 = llvm.mlir.constant(10 : i32) : i32 + omp.teams reduction(byref @add_reduction_byref_box_5xf32 %arg0 -> %arg2 : !llvm.ptr) { +omp.parallel { + omp.distribute { +omp.wsloop { + omp.loop_nest (%arg5) : i32 = (%c1_2) to (%c10) inclusive step (%c1_2) { +omp.yield + } +} {omp.composite} + } {omp.composite} + omp.terminator +} {omp.composite} +omp.terminator + } + omp.terminator +} +llvm.return + } +} + +// CHECK: call void @__kmpc_parallel_51({{.*}}, i32 1, i32 -1, i32 -1, +// CHECK-SAME: ptr @[[PAR_OUTLINED:.*]], ptr null, ptr %2, i64 1) + +// CHECK: define internal void @[[PAR_OUTLINED]]{{.*}} { +// CHECK: .omp.reduction.then: +// CHECK: br label %omp.reduction.nonatomic.body + +// CHECK: omp.reduction.nonatomic.body: +// CHECK: call void @bar() +// CHECK: br label %[[BODY_2ND_BB:.*]] + +// CHECK: [[BODY_2ND_BB]]: +// CHECK: call void @baz() +// CHECK: br label %[[CONT_BB:.*]] + +// CHECK: [[CONT_BB]]: +// CHECK: br label %.omp.reduction.done +// CHECK: } + +// CHECK: define internal void @"{{.*}}$reduction$reduction_func"(ptr noundef %0, ptr noundef %1) #0 { +// CHECK: br label %omp.reduction.nonatomic.body + +// CHECK: [[BODY_2ND_BB:.*]]: +// CHECK: call void @baz() +// CHECK: br label %omp.region.cont + + +// CHECK: omp.reduction.nonatomic.body: +// CHECK: call void @bar()
[llvm-branch-commits] [flang] [flang][OpenMP] `do concurrent`: support `local` on device (PR #157638)
https://github.com/ergawy updated https://github.com/llvm/llvm-project/pull/157638 >From cbb2c67df6d5a234dc66ae012f88c1ff36f1ac47 Mon Sep 17 00:00:00 2001 From: ergawy Date: Tue, 2 Sep 2025 05:54:00 -0500 Subject: [PATCH] [flang][OpenMP] `do concurrent`: support `local` on device Extends support for mapping `do concurrent` on the device by adding support for `local` specifiers. The changes in this PR map the local variable to the `omp.target` op and uses the mapped value as the `private` clause operand in the nested `omp.parallel` op. --- .../include/flang/Optimizer/Dialect/FIROps.td | 12 ++ .../OpenMP/DoConcurrentConversion.cpp | 192 +++--- .../Transforms/DoConcurrent/local_device.mlir | 49 + 3 files changed, 175 insertions(+), 78 deletions(-) create mode 100644 flang/test/Transforms/DoConcurrent/local_device.mlir diff --git a/flang/include/flang/Optimizer/Dialect/FIROps.td b/flang/include/flang/Optimizer/Dialect/FIROps.td index bc971e8fd6600..fc6eedc6ed4c6 100644 --- a/flang/include/flang/Optimizer/Dialect/FIROps.td +++ b/flang/include/flang/Optimizer/Dialect/FIROps.td @@ -3894,6 +3894,18 @@ def fir_DoConcurrentLoopOp : fir_Op<"do_concurrent.loop", return getReduceVars().size(); } +unsigned getInductionVarsStart() { + return 0; +} + +unsigned getLocalOperandsStart() { + return getNumInductionVars(); +} + +unsigned getReduceOperandsStart() { + return getLocalOperandsStart() + getNumLocalOperands(); +} + mlir::Block::BlockArgListType getInductionVars() { return getBody()->getArguments().slice(0, getNumInductionVars()); } diff --git a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp index 6c71924000842..d00a4fdd2cf2e 100644 --- a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp +++ b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp @@ -138,6 +138,9 @@ void collectLoopLiveIns(fir::DoConcurrentLoopOp loop, liveIns.push_back(operand->get()); }); + + for (mlir::Value local : loop.getLocalVars()) +liveIns.push_back(local); } /// Collects values that are local to a loop: "loop-local values". A loop-local @@ -298,8 +301,7 @@ class DoConcurrentConversion .getIsTargetDevice(); mlir::omp::TargetOperands targetClauseOps; - genLoopNestClauseOps(doLoop.getLoc(), rewriter, loop, mapper, - loopNestClauseOps, + genLoopNestClauseOps(doLoop.getLoc(), rewriter, loop, loopNestClauseOps, isTargetDevice ? nullptr : &targetClauseOps); LiveInShapeInfoMap liveInShapeInfoMap; @@ -321,14 +323,13 @@ class DoConcurrentConversion } mlir::omp::ParallelOp parallelOp = -genParallelOp(doLoop.getLoc(), rewriter, ivInfos, mapper); +genParallelOp(rewriter, loop, ivInfos, mapper); // Only set as composite when part of `distribute parallel do`. parallelOp.setComposite(mapToDevice); if (!mapToDevice) - genLoopNestClauseOps(doLoop.getLoc(), rewriter, loop, mapper, - loopNestClauseOps); + genLoopNestClauseOps(doLoop.getLoc(), rewriter, loop, loopNestClauseOps); for (mlir::Value local : locals) looputils::localizeLoopLocalValue(local, parallelOp.getRegion(), @@ -337,10 +338,38 @@ class DoConcurrentConversion if (mapToDevice) genDistributeOp(doLoop.getLoc(), rewriter).setComposite(/*val=*/true); -mlir::omp::LoopNestOp ompLoopNest = +auto [loopNestOp, wsLoopOp] = genWsLoopOp(rewriter, loop, mapper, loopNestClauseOps, /*isComposite=*/mapToDevice); +// `local` region arguments are transferred/cloned from the `do concurrent` +// loop to the loopnest op when the region is cloned above. Instead, these +// region arguments should be on the workshare loop's region. +if (mapToDevice) { + for (auto [parallelArg, loopNestArg] : llvm::zip_equal( + parallelOp.getRegion().getArguments(), + loopNestOp.getRegion().getArguments().slice( + loop.getLocalOperandsStart(), loop.getNumLocalOperands( +rewriter.replaceAllUsesWith(loopNestArg, parallelArg); + + for (auto [wsloopArg, loopNestArg] : llvm::zip_equal( + wsLoopOp.getRegion().getArguments(), + loopNestOp.getRegion().getArguments().slice( + loop.getReduceOperandsStart(), loop.getNumReduceOperands( +rewriter.replaceAllUsesWith(loopNestArg, wsloopArg); +} else { + for (auto [wsloopArg, loopNestArg] : + llvm::zip_equal(wsLoopOp.getRegion().getArguments(), + loopNestOp.getRegion().getArguments().drop_front( + loopNestClauseOps.loopLowerBounds.size( +rewriter.replaceAllUsesWith(loopNestArg, wsloopArg); +} + +for (unsigned i = 0; + i
[llvm-branch-commits] [llvm] Revert "[AMDGPU][gfx1250] Add `cu-store` subtarget feature (#150588)" (PR #157639)
Pierre-vh wrote: ### Merge activity * **Sep 10, 8:16 AM UTC**: A user started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/157639). https://github.com/llvm/llvm-project/pull/157639 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][gfx1250] Support "cluster" syncscope (PR #157641)
Pierre-vh wrote: ### Merge activity * **Sep 10, 8:16 AM UTC**: A user started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/157641). https://github.com/llvm/llvm-project/pull/157641 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Revert "[AMDGPU][gfx1250] Add `cu-store` subtarget feature (#150588)" (PR #157639)
Pierre-vh wrote: > Why do we want to revert it? Can you put it into the description as well? It's not a feature we need anymore for gfx1250. I updated the description https://github.com/llvm/llvm-project/pull/157639 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Generate canonical additions in AMDGPUPromoteAlloca (PR #157810)
https://github.com/ritter-x2a ready_for_review https://github.com/llvm/llvm-project/pull/157810 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Generate canonical additions in AMDGPUPromoteAlloca (PR #157810)
ritter-x2a wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/157810?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#157810** https://app.graphite.dev/github/pr/llvm/llvm-project/157810?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/157810?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#157682** https://app.graphite.dev/github/pr/llvm/llvm-project/157682?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/157810 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Generate canonical additions in AMDGPUPromoteAlloca (PR #157810)
https://github.com/nikic approved this pull request. https://github.com/llvm/llvm-project/pull/157810 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [AMDGPU] Add builtins for wave reduction intrinsics (PR #150170)
https://github.com/easyonaadit updated https://github.com/llvm/llvm-project/pull/150170 >From be85e6c0222fe757ac59959bad5c56a85a32b869 Mon Sep 17 00:00:00 2001 From: Aaditya Date: Sat, 19 Jul 2025 12:57:27 +0530 Subject: [PATCH] Add builtins for wave reduction intrinsics --- clang/include/clang/Basic/BuiltinsAMDGPU.def | 25 ++ clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp | 58 +++ clang/test/CodeGenOpenCL/builtins-amdgcn.cl | 378 +++ 3 files changed, 461 insertions(+) diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index e5a1422fe8778..56b1a8dc09b15 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -364,6 +364,31 @@ BUILTIN(__builtin_amdgcn_endpgm, "v", "nr") BUILTIN(__builtin_amdgcn_get_fpenv, "WUi", "n") BUILTIN(__builtin_amdgcn_set_fpenv, "vWUi", "n") +//===--===// + +// Wave Reduction builtins. + +//===--===// + +BUILTIN(__builtin_amdgcn_wave_reduce_add_u32, "ZUiZUiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_sub_u32, "ZUiZUiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_min_i32, "ZiZiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_min_u32, "ZUiZUiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_max_i32, "ZiZiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_max_u32, "ZUiZUiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_and_b32, "ZiZiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_or_b32, "ZiZiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_xor_b32, "ZiZiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_add_u64, "WUiWUiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_sub_u64, "WUiWUiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_min_i64, "WiWiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_min_u64, "WUiWUiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_max_i64, "WiWiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_max_u64, "WUiWUiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_and_b64, "WiWiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_or_b64, "WiWiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_xor_b64, "WiWiZi", "nc") + //===--===// // R600-NI only builtins. //===--===// diff --git a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp index 87a46287c4022..07cf08c54985a 100644 --- a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp +++ b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp @@ -295,11 +295,69 @@ void CodeGenFunction::AddAMDGPUFenceAddressSpaceMMRA(llvm::Instruction *Inst, Inst->setMetadata(LLVMContext::MD_mmra, MMRAMetadata::getMD(Ctx, MMRAs)); } +static Intrinsic::ID getIntrinsicIDforWaveReduction(unsigned BuiltinID) { + switch (BuiltinID) { + default: +llvm_unreachable("Unknown BuiltinID for wave reduction"); + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u32: + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u64: +return Intrinsic::amdgcn_wave_reduce_add; + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u32: + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u64: +return Intrinsic::amdgcn_wave_reduce_sub; + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_i32: + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_i64: +return Intrinsic::amdgcn_wave_reduce_min; + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_u32: + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_u64: +return Intrinsic::amdgcn_wave_reduce_umin; + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_i32: + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_i64: +return Intrinsic::amdgcn_wave_reduce_max; + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_u32: + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_u64: +return Intrinsic::amdgcn_wave_reduce_umax; + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_and_b32: + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_and_b64: +return Intrinsic::amdgcn_wave_reduce_and; + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_or_b32: + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_or_b64: +return Intrinsic::amdgcn_wave_reduce_or; + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_xor_b32: + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_xor_b64: +return Intrinsic::amdgcn_wave_reduce_xor; + } +} + Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, const CallExpr *E) { llvm::AtomicOrdering AO = llvm::AtomicOrdering::SequentiallyConsistent; llvm::SyncScope::ID SSID; switch (BuiltinID) { + case AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u32: + case AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u
[llvm-branch-commits] [clang] [clang][LoongArch] Introduce LASX and LSX conversion intrinsics (PR #157819)
https://github.com/heiher created https://github.com/llvm/llvm-project/pull/157819 This patch introduces the LASX and LSX conversion intrinsics: - __m256 __lasx_cast_128_s (__m128) - __m256d __lasx_cast_128_d (__m128d) - __m256i __lasx_cast_128 (__m128i) - __m256 __lasx_concat_128_s (__m128, __m128) - __m256d __lasx_concat_128_d (__m128, __m128d) - __m256i __lasx_concat_128 (__m128, __m128i) - __m128 __lasx_extract_128_lo_s (__m256) - __m128d __lasx_extract_128_lo_d (__m256d) - __m128i __lasx_extract_128_lo (__m256i) - __m128 __lasx_extract_128_hi_s (__m256) - __m128d __lasx_extract_128_hi_d (__m256d) - __m128i __lasx_extract_128_hi (__m256i) - __m256 __lasx_insert_128_lo_s (__m256, __m128) - __m256d __lasx_insert_128_lo_d (__m256d, __m128d) - __m256i __lasx_insert_128_lo (__m256i, __m128i) - __m256 __lasx_insert_128_hi_s (__m256, __m128) - __m256d __lasx_insert_128_hi_d (__m256d, __m128d) - __m256i __lasx_insert_128_hi (__m256i, __m128i) >From 91ca73f8a3ffa1b5e750252984e1a5d8f6097d28 Mon Sep 17 00:00:00 2001 From: WANG Rui Date: Wed, 10 Sep 2025 17:11:10 +0800 Subject: [PATCH] [clang][LoongArch] Introduce LASX and LSX conversion intrinsics This patch introduces the LASX and LSX conversion intrinsics: - __m256 __lasx_cast_128_s (__m128) - __m256d __lasx_cast_128_d (__m128d) - __m256i __lasx_cast_128 (__m128i) - __m256 __lasx_concat_128_s (__m128, __m128) - __m256d __lasx_concat_128_d (__m128, __m128d) - __m256i __lasx_concat_128 (__m128, __m128i) - __m128 __lasx_extract_128_lo_s (__m256) - __m128d __lasx_extract_128_lo_d (__m256d) - __m128i __lasx_extract_128_lo (__m256i) - __m128 __lasx_extract_128_hi_s (__m256) - __m128d __lasx_extract_128_hi_d (__m256d) - __m128i __lasx_extract_128_hi (__m256i) - __m256 __lasx_insert_128_lo_s (__m256, __m128) - __m256d __lasx_insert_128_lo_d (__m256d, __m128d) - __m256i __lasx_insert_128_lo (__m256i, __m128i) - __m256 __lasx_insert_128_hi_s (__m256, __m128) - __m256d __lasx_insert_128_hi_d (__m256d, __m128d) - __m256i __lasx_insert_128_hi (__m256i, __m128i) --- .../clang/Basic/BuiltinsLoongArchLASX.def | 19 +++ clang/lib/Headers/lasxintrin.h| 110 .../CodeGen/LoongArch/lasx/builtin-alias.c| 153 + clang/test/CodeGen/LoongArch/lasx/builtin.c | 157 ++ 4 files changed, 439 insertions(+) diff --git a/clang/include/clang/Basic/BuiltinsLoongArchLASX.def b/clang/include/clang/Basic/BuiltinsLoongArchLASX.def index c4ea46a3bc5b5..b234dedad648e 100644 --- a/clang/include/clang/Basic/BuiltinsLoongArchLASX.def +++ b/clang/include/clang/Basic/BuiltinsLoongArchLASX.def @@ -986,3 +986,22 @@ TARGET_BUILTIN(__builtin_lasx_xbnz_b, "iV32Uc", "nc", "lasx") TARGET_BUILTIN(__builtin_lasx_xbnz_h, "iV16Us", "nc", "lasx") TARGET_BUILTIN(__builtin_lasx_xbnz_w, "iV8Ui", "nc", "lasx") TARGET_BUILTIN(__builtin_lasx_xbnz_d, "iV4ULLi", "nc", "lasx") + +TARGET_BUILTIN(__builtin_lasx_cast_128_s, "V8fV4f", "nc", "lasx") +TARGET_BUILTIN(__builtin_lasx_cast_128_d, "V4dV2d", "nc", "lasx") +TARGET_BUILTIN(__builtin_lasx_cast_128, "V32ScV16Sc", "nc", "lasx") +TARGET_BUILTIN(__builtin_lasx_concat_128_s, "V8fV4fV4f", "nc", "lasx") +TARGET_BUILTIN(__builtin_lasx_concat_128_d, "V4dV2dV2d", "nc", "lasx") +TARGET_BUILTIN(__builtin_lasx_concat_128, "V32ScV16ScV16Sc", "nc", "lasx") +TARGET_BUILTIN(__builtin_lasx_extract_128_lo_s, "V4fV8f", "nc", "lasx") +TARGET_BUILTIN(__builtin_lasx_extract_128_lo_d, "V2dV4d", "nc", "lasx") +TARGET_BUILTIN(__builtin_lasx_extract_128_lo, "V16ScV32Sc", "nc", "lasx") +TARGET_BUILTIN(__builtin_lasx_extract_128_hi_s, "V4fV8f", "nc", "lasx") +TARGET_BUILTIN(__builtin_lasx_extract_128_hi_d, "V2dV4d", "nc", "lasx") +TARGET_BUILTIN(__builtin_lasx_extract_128_hi, "V16ScV32Sc", "nc", "lasx") +TARGET_BUILTIN(__builtin_lasx_insert_128_lo_s, "V8fV8fV4f", "nc", "lasx") +TARGET_BUILTIN(__builtin_lasx_insert_128_lo_d, "V4dV4dV2d", "nc", "lasx") +TARGET_BUILTIN(__builtin_lasx_insert_128_lo, "V32ScV32ScV16Sc", "nc", "lasx") +TARGET_BUILTIN(__builtin_lasx_insert_128_hi_s, "V8fV8fV4f", "nc", "lasx") +TARGET_BUILTIN(__builtin_lasx_insert_128_hi_d, "V4dV4dV2d", "nc", "lasx") +TARGET_BUILTIN(__builtin_lasx_insert_128_hi, "V32ScV32ScV16Sc", "nc", "lasx") diff --git a/clang/lib/Headers/lasxintrin.h b/clang/lib/Headers/lasxintrin.h index 85020d82829e2..6dd8ac24ed46d 100644 --- a/clang/lib/Headers/lasxintrin.h +++ b/clang/lib/Headers/lasxintrin.h @@ -10,6 +10,8 @@ #ifndef _LOONGSON_ASXINTRIN_H #define _LOONGSON_ASXINTRIN_H 1 +#include + #if defined(__loongarch_asx) typedef signed char v32i8 __attribute__((vector_size(32), aligned(32))); @@ -3882,5 +3884,113 @@ extern __inline #define __lasx_xvrepli_w(/*si10*/ _1) ((__m256i)__builtin_lasx_xvrepli_w((_1))) +extern __inline +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) __m256 +__lasx_cast_128_s(__m128 _1) { + return (__m256)__builtin_lasx_cast_128_s((v4f32)_1); +} + +extern __inline +__attribute__((__gnu_inline__, __always_in
[llvm-branch-commits] [clang] [clang][LoongArch] Introduce LASX and LSX conversion intrinsics (PR #157819)
github-actions[bot] wrote: :warning: C/C++ code formatter, clang-format found issues in your code. :warning: You can test this locally with the following command: ``bash git-clang-format --diff origin/main HEAD --extensions h,c -- clang/lib/Headers/lasxintrin.h clang/test/CodeGen/LoongArch/lasx/builtin-alias.c clang/test/CodeGen/LoongArch/lasx/builtin.c `` :warning: The reproduction instructions above might return results for more than one PR in a stack if you are using a stacked PR workflow. You can limit the results by changing `origin/main` to the base branch/commit you want to compare against. :warning: View the diff from clang-format here. ``diff diff --git a/clang/lib/Headers/lasxintrin.h b/clang/lib/Headers/lasxintrin.h index 6dd8ac24e..417671ffd 100644 --- a/clang/lib/Headers/lasxintrin.h +++ b/clang/lib/Headers/lasxintrin.h @@ -3885,8 +3885,8 @@ extern __inline #define __lasx_xvrepli_w(/*si10*/ _1) ((__m256i)__builtin_lasx_xvrepli_w((_1))) extern __inline -__attribute__((__gnu_inline__, __always_inline__, __artificial__)) __m256 -__lasx_cast_128_s(__m128 _1) { +__attribute__((__gnu_inline__, __always_inline__, + __artificial__)) __m256 __lasx_cast_128_s(__m128 _1) { return (__m256)__builtin_lasx_cast_128_s((v4f32)_1); } `` https://github.com/llvm/llvm-project/pull/157819 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [AMDGPU] Add builtins for wave reduction intrinsics (PR #150170)
https://github.com/easyonaadit updated https://github.com/llvm/llvm-project/pull/150170 >From 308545da2b700e93d2c4b5e32c8392468385 Mon Sep 17 00:00:00 2001 From: Aaditya Date: Sat, 19 Jul 2025 12:57:27 +0530 Subject: [PATCH] Add builtins for wave reduction intrinsics --- clang/include/clang/Basic/BuiltinsAMDGPU.def | 25 ++ clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp | 58 +++ clang/test/CodeGenOpenCL/builtins-amdgcn.cl | 378 +++ 3 files changed, 461 insertions(+) diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index e5a1422fe8778..56b1a8dc09b15 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -364,6 +364,31 @@ BUILTIN(__builtin_amdgcn_endpgm, "v", "nr") BUILTIN(__builtin_amdgcn_get_fpenv, "WUi", "n") BUILTIN(__builtin_amdgcn_set_fpenv, "vWUi", "n") +//===--===// + +// Wave Reduction builtins. + +//===--===// + +BUILTIN(__builtin_amdgcn_wave_reduce_add_u32, "ZUiZUiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_sub_u32, "ZUiZUiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_min_i32, "ZiZiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_min_u32, "ZUiZUiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_max_i32, "ZiZiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_max_u32, "ZUiZUiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_and_b32, "ZiZiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_or_b32, "ZiZiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_xor_b32, "ZiZiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_add_u64, "WUiWUiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_sub_u64, "WUiWUiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_min_i64, "WiWiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_min_u64, "WUiWUiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_max_i64, "WiWiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_max_u64, "WUiWUiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_and_b64, "WiWiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_or_b64, "WiWiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_xor_b64, "WiWiZi", "nc") + //===--===// // R600-NI only builtins. //===--===// diff --git a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp index 87a46287c4022..07cf08c54985a 100644 --- a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp +++ b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp @@ -295,11 +295,69 @@ void CodeGenFunction::AddAMDGPUFenceAddressSpaceMMRA(llvm::Instruction *Inst, Inst->setMetadata(LLVMContext::MD_mmra, MMRAMetadata::getMD(Ctx, MMRAs)); } +static Intrinsic::ID getIntrinsicIDforWaveReduction(unsigned BuiltinID) { + switch (BuiltinID) { + default: +llvm_unreachable("Unknown BuiltinID for wave reduction"); + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u32: + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u64: +return Intrinsic::amdgcn_wave_reduce_add; + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u32: + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u64: +return Intrinsic::amdgcn_wave_reduce_sub; + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_i32: + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_i64: +return Intrinsic::amdgcn_wave_reduce_min; + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_u32: + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_u64: +return Intrinsic::amdgcn_wave_reduce_umin; + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_i32: + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_i64: +return Intrinsic::amdgcn_wave_reduce_max; + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_u32: + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_u64: +return Intrinsic::amdgcn_wave_reduce_umax; + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_and_b32: + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_and_b64: +return Intrinsic::amdgcn_wave_reduce_and; + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_or_b32: + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_or_b64: +return Intrinsic::amdgcn_wave_reduce_or; + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_xor_b32: + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_xor_b64: +return Intrinsic::amdgcn_wave_reduce_xor; + } +} + Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, const CallExpr *E) { llvm::AtomicOrdering AO = llvm::AtomicOrdering::SequentiallyConsistent; llvm::SyncScope::ID SSID; switch (BuiltinID) { + case AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u32: + case AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u
[llvm-branch-commits] [llvm] [AllocToken, Clang] Implement __builtin_alloc_token_infer() and llvm.alloc.token.id (PR #156842)
@@ -3352,10 +3352,15 @@ class CodeGenFunction : public CodeGenTypeCache { SanitizerAnnotateDebugInfo(ArrayRef Ordinals, SanitizerHandler Handler); - /// Emit additional metadata used by the AllocToken instrumentation. + /// Emit metadata used by the AllocToken instrumentation. + llvm::MDNode *EmitAllocTokenHint(QualType AllocType); ojhunt wrote: I think this should be something like `BuildAllocTokenHint` -- also does llvm permit multiple nodes to share this hint? This is basically a "can this be cached and reused?" question - For TMO we needed to cache Type->descriptor to avoid any compile time regression - though I guess the TMO descriptors are more expensive to produce. https://github.com/llvm/llvm-project/pull/156842 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [AMDGPU] Add builtins for wave reduction intrinsics (PR #150170)
easyonaadit wrote: ### Merge activity * **Sep 10, 10:47 AM UTC**: A user started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/150170). https://github.com/llvm/llvm-project/pull/150170 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Extending wave reduction intrinsics for `i64` types - 2 (PR #151309)
easyonaadit wrote: ### Merge activity * **Sep 10, 10:47 AM UTC**: A user started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/151309). https://github.com/llvm/llvm-project/pull/151309 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Extending wave reduction intrinsics for `i64` types - 3 (PR #151310)
easyonaadit wrote: ### Merge activity * **Sep 10, 10:47 AM UTC**: A user started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/151310). https://github.com/llvm/llvm-project/pull/151310 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [AMDGPU] Add builtins for wave reduction intrinsics (PR #150170)
https://github.com/easyonaadit updated https://github.com/llvm/llvm-project/pull/150170 >From 207c0b3f427403f0e504f9631f9d7523aecdb0a8 Mon Sep 17 00:00:00 2001 From: Aaditya Date: Sat, 19 Jul 2025 12:57:27 +0530 Subject: [PATCH] Add builtins for wave reduction intrinsics --- clang/include/clang/Basic/BuiltinsAMDGPU.def | 25 ++ clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp | 58 +++ clang/test/CodeGenOpenCL/builtins-amdgcn.cl | 378 +++ 3 files changed, 461 insertions(+) diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index e5a1422fe8778..56b1a8dc09b15 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -364,6 +364,31 @@ BUILTIN(__builtin_amdgcn_endpgm, "v", "nr") BUILTIN(__builtin_amdgcn_get_fpenv, "WUi", "n") BUILTIN(__builtin_amdgcn_set_fpenv, "vWUi", "n") +//===--===// + +// Wave Reduction builtins. + +//===--===// + +BUILTIN(__builtin_amdgcn_wave_reduce_add_u32, "ZUiZUiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_sub_u32, "ZUiZUiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_min_i32, "ZiZiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_min_u32, "ZUiZUiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_max_i32, "ZiZiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_max_u32, "ZUiZUiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_and_b32, "ZiZiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_or_b32, "ZiZiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_xor_b32, "ZiZiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_add_u64, "WUiWUiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_sub_u64, "WUiWUiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_min_i64, "WiWiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_min_u64, "WUiWUiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_max_i64, "WiWiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_max_u64, "WUiWUiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_and_b64, "WiWiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_or_b64, "WiWiZi", "nc") +BUILTIN(__builtin_amdgcn_wave_reduce_xor_b64, "WiWiZi", "nc") + //===--===// // R600-NI only builtins. //===--===// diff --git a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp index 87a46287c4022..07cf08c54985a 100644 --- a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp +++ b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp @@ -295,11 +295,69 @@ void CodeGenFunction::AddAMDGPUFenceAddressSpaceMMRA(llvm::Instruction *Inst, Inst->setMetadata(LLVMContext::MD_mmra, MMRAMetadata::getMD(Ctx, MMRAs)); } +static Intrinsic::ID getIntrinsicIDforWaveReduction(unsigned BuiltinID) { + switch (BuiltinID) { + default: +llvm_unreachable("Unknown BuiltinID for wave reduction"); + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u32: + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u64: +return Intrinsic::amdgcn_wave_reduce_add; + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u32: + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u64: +return Intrinsic::amdgcn_wave_reduce_sub; + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_i32: + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_i64: +return Intrinsic::amdgcn_wave_reduce_min; + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_u32: + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_u64: +return Intrinsic::amdgcn_wave_reduce_umin; + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_i32: + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_i64: +return Intrinsic::amdgcn_wave_reduce_max; + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_u32: + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_u64: +return Intrinsic::amdgcn_wave_reduce_umax; + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_and_b32: + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_and_b64: +return Intrinsic::amdgcn_wave_reduce_and; + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_or_b32: + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_or_b64: +return Intrinsic::amdgcn_wave_reduce_or; + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_xor_b32: + case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_xor_b64: +return Intrinsic::amdgcn_wave_reduce_xor; + } +} + Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, const CallExpr *E) { llvm::AtomicOrdering AO = llvm::AtomicOrdering::SequentiallyConsistent; llvm::SyncScope::ID SSID; switch (BuiltinID) { + case AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u32: + case AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u
[llvm-branch-commits] [llvm] [AMDGPU] Propagate Constants for Wave Reduction Intrinsics (PR #150395)
easyonaadit wrote: ### Merge activity * **Sep 10, 10:47 AM UTC**: A user started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/150395). https://github.com/llvm/llvm-project/pull/150395 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][gfx1250] Remove SCOPE_SE for scratch stores (PR #157640)
Pierre-vh wrote: ### Merge activity * **Sep 10, 8:16 AM UTC**: A user started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/157640). https://github.com/llvm/llvm-project/pull/157640 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/UniformityAnalysis: fix G_ZEXTLOAD and G_SEXTLOAD (PR #157845)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: Petar Avramovic (petar-avramovic) Changes Use same rules for G_ZEXTLOAD and G_SEXTLOAD as for G_LOAD. Flat addrspace(0) and private addrspace(5) G_ZEXTLOAD and G_SEXTLOAD should be always divergent. --- Full diff: https://github.com/llvm/llvm-project/pull/157845.diff 2 Files Affected: - (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.cpp (+8-7) - (modified) llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/loads-gmir.mir (+12-8) ``diff diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp index 5c958dfe6954f..398c99b3bd127 100644 --- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp +++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp @@ -10281,7 +10281,7 @@ unsigned SIInstrInfo::getInstrLatency(const InstrItineraryData *ItinData, InstructionUniformity SIInstrInfo::getGenericInstructionUniformity(const MachineInstr &MI) const { const MachineRegisterInfo &MRI = MI.getMF()->getRegInfo(); - unsigned opcode = MI.getOpcode(); + unsigned Opcode = MI.getOpcode(); auto HandleAddrSpaceCast = [this, &MRI](const MachineInstr &MI) { Register Dst = MI.getOperand(0).getReg(); @@ -10301,7 +10301,7 @@ SIInstrInfo::getGenericInstructionUniformity(const MachineInstr &MI) const { // If the target supports globally addressable scratch, the mapping from // scratch memory to the flat aperture changes therefore an address space cast // is no longer uniform. - if (opcode == TargetOpcode::G_ADDRSPACE_CAST) + if (Opcode == TargetOpcode::G_ADDRSPACE_CAST) return HandleAddrSpaceCast(MI); if (auto *GI = dyn_cast(&MI)) { @@ -10329,7 +10329,8 @@ SIInstrInfo::getGenericInstructionUniformity(const MachineInstr &MI) const { // // All other loads are not divergent, because if threads issue loads with the // same arguments, they will always get the same result. - if (opcode == AMDGPU::G_LOAD) { + if (Opcode == AMDGPU::G_LOAD || Opcode == AMDGPU::G_ZEXTLOAD || + Opcode == AMDGPU::G_SEXTLOAD) { if (MI.memoperands_empty()) return InstructionUniformity::NeverUniform; // conservative assumption @@ -10343,10 +10344,10 @@ SIInstrInfo::getGenericInstructionUniformity(const MachineInstr &MI) const { return InstructionUniformity::Default; } - if (SIInstrInfo::isGenericAtomicRMWOpcode(opcode) || - opcode == AMDGPU::G_ATOMIC_CMPXCHG || - opcode == AMDGPU::G_ATOMIC_CMPXCHG_WITH_SUCCESS || - AMDGPU::isGenericAtomic(opcode)) { + if (SIInstrInfo::isGenericAtomicRMWOpcode(Opcode) || + Opcode == AMDGPU::G_ATOMIC_CMPXCHG || + Opcode == AMDGPU::G_ATOMIC_CMPXCHG_WITH_SUCCESS || + AMDGPU::isGenericAtomic(Opcode)) { return InstructionUniformity::NeverUniform; } return InstructionUniformity::Default; diff --git a/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/loads-gmir.mir b/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/loads-gmir.mir index cb3c2de5b8753..d799cd2057f47 100644 --- a/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/loads-gmir.mir +++ b/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/loads-gmir.mir @@ -46,13 +46,13 @@ body: | %6:_(p5) = G_IMPLICIT_DEF ; Atomic load -; CHECK-NOT: DIVERGENT - +; CHECK: DIVERGENT +; CHECK-SAME: G_ZEXTLOAD %0:_(s32) = G_ZEXTLOAD %1(p0) :: (load seq_cst (s16) from `ptr undef`) ; flat load -; CHECK-NOT: DIVERGENT - +; CHECK: DIVERGENT +; CHECK-SAME: G_ZEXTLOAD %2:_(s32) = G_ZEXTLOAD %1(p0) :: (load (s16) from `ptr undef`) ; Gloabal load @@ -60,7 +60,8 @@ body: | %3:_(s32) = G_ZEXTLOAD %4(p1) :: (load (s16) from `ptr addrspace(1) undef`, addrspace 1) ; Private load -; CHECK-NOT: DIVERGENT +; CHECK: DIVERGENT +; CHECK-SAME: G_ZEXTLOAD %5:_(s32) = G_ZEXTLOAD %6(p5) :: (volatile load (s16) from `ptr addrspace(5) undef`, addrspace 5) G_STORE %2(s32), %4(p1) :: (volatile store (s32) into `ptr addrspace(1) undef`, addrspace 1) G_STORE %3(s32), %4(p1) :: (volatile store (s32) into `ptr addrspace(1) undef`, addrspace 1) @@ -80,11 +81,13 @@ body: | %6:_(p5) = G_IMPLICIT_DEF ; Atomic load -; CHECK-NOT: DIVERGENT +; CHECK: DIVERGENT +; CHECK-SAME: G_SEXTLOAD %0:_(s32) = G_SEXTLOAD %1(p0) :: (load seq_cst (s16) from `ptr undef`) ; flat load -; CHECK-NOT: DIVERGENT +; CHECK: DIVERGENT +; CHECK-SAME: G_SEXTLOAD %2:_(s32) = G_SEXTLOAD %1(p0) :: (load (s16) from `ptr undef`) ; Gloabal load @@ -92,7 +95,8 @@ body: | %3:_(s32) = G_SEXTLOAD %4(p1) :: (load (s16) from `ptr addrspace(1) undef`, addrspace 1) ; Private load -; CHECK-NOT: DIVERGENT +; CHECK: DIVERGENT +; CHECK-SAME: G_SEXTLOAD %5:_(s32) = G_SEXTLOAD %6(p5) :: (volatile load (s16) from `ptr addrspace(5) undef`, addrspace 5) G_STORE %2(s32), %4(p1) :: (volatile store (s32) into `ptr addrspace(1) undef`, ad
[llvm-branch-commits] [llvm] [Remarks] Remove redundant size from StringRefs (NFC) (PR #156357)
https://github.com/tobias-stadler updated https://github.com/llvm/llvm-project/pull/156357 >From e3951bca5a4a5c169975f13faa679a761455976a Mon Sep 17 00:00:00 2001 From: Tobias Stadler Date: Mon, 1 Sep 2025 19:02:32 +0100 Subject: [PATCH] fix format Created using spr 1.3.7-wip --- llvm/include/llvm/Remarks/BitstreamRemarkContainer.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/llvm/include/llvm/Remarks/BitstreamRemarkContainer.h b/llvm/include/llvm/Remarks/BitstreamRemarkContainer.h index 2e378fd755588..48a148a3adc13 100644 --- a/llvm/include/llvm/Remarks/BitstreamRemarkContainer.h +++ b/llvm/include/llvm/Remarks/BitstreamRemarkContainer.h @@ -96,7 +96,8 @@ constexpr StringLiteral MetaExternalFileName("External File"); constexpr StringLiteral RemarkHeaderName("Remark header"); constexpr StringLiteral RemarkDebugLocName("Remark debug location"); constexpr StringLiteral RemarkHotnessName("Remark hotness"); -constexpr StringLiteral RemarkArgWithDebugLocName("Argument with debug location"); +constexpr StringLiteral +RemarkArgWithDebugLocName("Argument with debug location"); constexpr StringLiteral RemarkArgWithoutDebugLocName("Argument"); } // end namespace remarks ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] Prepare libcxx and libcxxabi for pointer field protection. (PR #151651)
@@ -26,6 +26,12 @@ # include #endif +#if defined(__POINTER_FIELD_PROTECTION__) +constexpr bool pfp_disabled = false; +#else +constexpr bool pfp_disabled = true; +#endif philnik777 wrote: Again, can we just disable the test instead? https://github.com/llvm/llvm-project/pull/151651 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] Prepare libcxx and libcxxabi for pointer field protection. (PR #151651)
@@ -1262,6 +1275,14 @@ typedef __char32_t char32_t; #define _LIBCPP_HAS_EXPLICIT_THIS_PARAMETER 0 # endif +# if defined(__POINTER_FIELD_PROTECTION__) +#define _LIBCPP_PFP [[clang::pointer_field_protection]] +#define _LIBCPP_NO_PFP [[clang::no_field_protection]] philnik777 wrote: These should be _Uglified. Do these attributes do anything with pfp disabled? If no, why not simply check for their availability like with other attributes? https://github.com/llvm/llvm-project/pull/151651 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] Prepare libcxx and libcxxabi for pointer field protection. (PR #151651)
@@ -484,8 +484,21 @@ typedef __char32_t char32_t; #define _LIBCPP_EXCEPTIONS_SIG e # endif +# if !_LIBCPP_HAS_EXCEPTIONS +#define _LIBCPP_EXCEPTIONS_SIG n +# else +#define _LIBCPP_EXCEPTIONS_SIG e +# endif + +# if defined(__POINTER_FIELD_PROTECTION__) +#define _LIBCPP_PFP_SIG p +# else +#define _LIBCPP_PFP_SIG +# endif philnik777 wrote: My understanding is that pfp changes the layout of certain types? Why should there be an ABI tag for it? https://github.com/llvm/llvm-project/pull/151651 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libc++] Add build and CI support for pointer field protection (PR #152414)
@@ -411,6 +411,42 @@ bootstrapping-build) ccache -s ;; +bootstrapping-build-pfp) philnik777 wrote: A bootstrapping build is incredibly heavy weight. Why is this required? https://github.com/llvm/llvm-project/pull/152414 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [mlir] [flang][OpenMP] Support multi-block reduction combiner regions on the GPU (PR #156837)
@@ -3750,6 +3752,7 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::createReductionsGPU( RI.ReductionGen(Builder.saveIP(), RHSValue, LHSValue, Reduced); if (!AfterIP) return AfterIP.takeError(); + Builder.SetInsertPoint(AfterIP->getBlock(), AfterIP->getPoint()); abidh wrote: ```suggestion Builder.restoreIP(*AfterIP); ``` https://github.com/llvm/llvm-project/pull/156837 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [compiler-rt] Backport AArch64 sanitizer fixes to 21.x. (PR #157848)
llvmbot wrote: @llvm/pr-subscribers-compiler-rt-sanitizer Author: Michał Górny (mgorny) Changes Backport the following commits to 21.x branch: - 19cfc30 - 4485a3f - 6beb6f3 --- Full diff: https://github.com/llvm/llvm-project/pull/157848.diff 11 Files Affected: - (modified) compiler-rt/lib/gwp_asan/tests/basic.cpp (+6-5) - (modified) compiler-rt/lib/gwp_asan/tests/never_allocated.cpp (+6-4) - (modified) compiler-rt/test/asan/TestCases/Linux/release_to_os_test.cpp (+1) - (modified) compiler-rt/test/cfi/cross-dso/lit.local.cfg.py (+4) - (modified) compiler-rt/test/dfsan/atomic.cpp (+5-2) - (modified) compiler-rt/test/lit.common.cfg.py (+17) - (modified) compiler-rt/test/msan/dtls_test.c (+1) - (modified) compiler-rt/test/sanitizer_common/TestCases/Linux/odd_stack_size.cpp (+1) - (modified) compiler-rt/test/sanitizer_common/TestCases/Linux/release_to_os_test.cpp (+3) - (modified) compiler-rt/test/sanitizer_common/TestCases/Linux/resize_tls_dynamic.cpp (+3) - (modified) compiler-rt/test/sanitizer_common/TestCases/Linux/tls_get_addr.c (+3) ``diff diff --git a/compiler-rt/lib/gwp_asan/tests/basic.cpp b/compiler-rt/lib/gwp_asan/tests/basic.cpp index 88e7ed14a5c2f..7d36a2ee1f947 100644 --- a/compiler-rt/lib/gwp_asan/tests/basic.cpp +++ b/compiler-rt/lib/gwp_asan/tests/basic.cpp @@ -65,11 +65,12 @@ TEST_F(DefaultGuardedPoolAllocator, NonPowerOfTwoAlignment) { // Added multi-page slots? You'll need to expand this test. TEST_F(DefaultGuardedPoolAllocator, TooBigForSinglePageSlots) { - EXPECT_EQ(nullptr, GPA.allocate(0x1001, 0)); - EXPECT_EQ(nullptr, GPA.allocate(0x1001, 1)); - EXPECT_EQ(nullptr, GPA.allocate(0x1001, 0x1000)); - EXPECT_EQ(nullptr, GPA.allocate(1, 0x2000)); - EXPECT_EQ(nullptr, GPA.allocate(0, 0x2000)); + size_t PageSize = sysconf(_SC_PAGESIZE); + EXPECT_EQ(nullptr, GPA.allocate(PageSize + 1, 0)); + EXPECT_EQ(nullptr, GPA.allocate(PageSize + 1, 1)); + EXPECT_EQ(nullptr, GPA.allocate(PageSize + 1, PageSize)); + EXPECT_EQ(nullptr, GPA.allocate(1, 2 * PageSize)); + EXPECT_EQ(nullptr, GPA.allocate(0, 2 * PageSize)); } TEST_F(CustomGuardedPoolAllocator, AllocAllSlots) { diff --git a/compiler-rt/lib/gwp_asan/tests/never_allocated.cpp b/compiler-rt/lib/gwp_asan/tests/never_allocated.cpp index 2f695b4379861..37a4b384e4ac0 100644 --- a/compiler-rt/lib/gwp_asan/tests/never_allocated.cpp +++ b/compiler-rt/lib/gwp_asan/tests/never_allocated.cpp @@ -13,8 +13,10 @@ #include "gwp_asan/tests/harness.h" TEST_P(BacktraceGuardedPoolAllocatorDeathTest, NeverAllocated) { + size_t PageSize = sysconf(_SC_PAGESIZE); + SCOPED_TRACE(""); - void *Ptr = GPA.allocate(0x1000); + void *Ptr = GPA.allocate(PageSize); GPA.deallocate(Ptr); std::string DeathNeedle = @@ -23,7 +25,7 @@ TEST_P(BacktraceGuardedPoolAllocatorDeathTest, NeverAllocated) { // Trigger a guard page in a completely different slot that's never allocated. // Previously, there was a bug that this would result in nullptr-dereference // in the posix crash handler. - char *volatile NeverAllocatedPtr = static_cast(Ptr) + 0x3000; + char *volatile NeverAllocatedPtr = static_cast(Ptr) + 3 * PageSize; if (!Recoverable) { EXPECT_DEATH(*NeverAllocatedPtr = 0, DeathNeedle); return; @@ -37,8 +39,8 @@ TEST_P(BacktraceGuardedPoolAllocatorDeathTest, NeverAllocated) { GetOutputBuffer().clear(); for (size_t i = 0; i < 100; ++i) { *NeverAllocatedPtr = 0; -*(NeverAllocatedPtr + 0x2000) = 0; -*(NeverAllocatedPtr + 0x3000) = 0; +*(NeverAllocatedPtr + 2 * PageSize) = 0; +*(NeverAllocatedPtr + 3 * PageSize) = 0; ASSERT_TRUE(GetOutputBuffer().empty()); } diff --git a/compiler-rt/test/asan/TestCases/Linux/release_to_os_test.cpp b/compiler-rt/test/asan/TestCases/Linux/release_to_os_test.cpp index 3e28ffde46ab6..dc3ead9e8436c 100644 --- a/compiler-rt/test/asan/TestCases/Linux/release_to_os_test.cpp +++ b/compiler-rt/test/asan/TestCases/Linux/release_to_os_test.cpp @@ -6,6 +6,7 @@ // RUN: %env_asan_opts=allocator_release_to_os_interval_ms=-1 %run %t force 2>&1 | FileCheck %s --check-prefix=FORCE_RELEASE // REQUIRES: x86_64-target-arch +// REQUIRES: page-size-4096 #include #include diff --git a/compiler-rt/test/cfi/cross-dso/lit.local.cfg.py b/compiler-rt/test/cfi/cross-dso/lit.local.cfg.py index dceb7cde7218b..5f5486af3779f 100644 --- a/compiler-rt/test/cfi/cross-dso/lit.local.cfg.py +++ b/compiler-rt/test/cfi/cross-dso/lit.local.cfg.py @@ -12,3 +12,7 @@ def getRoot(config): # Android O (API level 26) has support for cross-dso cfi in libdl.so. if config.android and "android-26" not in config.available_features: config.unsupported = True + +# The runtime library only supports 4K pages. +if "page-size-4096" not in config.available_features: +config.unsupported = True diff --git a/compiler-rt/test/dfsan/atomic.cpp b/compiler-rt/test/dfsan/atomic.cpp index 22ee323c752f8..73e1cbd17a7cd 100644 --- a/compiler-rt/test/dfsan/atomic.cpp +++ b/
[llvm-branch-commits] [llvm] AMDGPU/UniformityAnalysis: fix G_ZEXTLOAD and G_SEXTLOAD (PR #157845)
https://github.com/Pierre-vh approved this pull request. https://github.com/llvm/llvm-project/pull/157845 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] [libc++] Add ABI flag to make __tree nodes more compact (PR #147681)
@@ -98,6 +99,8 @@ # endif #endif +#define _LIBCPP_ABI_TREE_POINTER_INT_PAIR ldionne wrote: Let's add some documentation for this. Also (or only?) in the `.rst` docs? https://github.com/llvm/llvm-project/pull/147681 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [flang][do concurent] Add saxpy offload tests for OpenMP mapping (PR #155993)
https://github.com/ergawy updated https://github.com/llvm/llvm-project/pull/155993 >From e36db5923f8122cc56a99461b3e0030e06071a5d Mon Sep 17 00:00:00 2001 From: ergawy Date: Fri, 29 Aug 2025 04:04:07 -0500 Subject: [PATCH] [flang][do concurent] Add saxpy offload tests for OpenMP mapping Adds end-to-end tests for `do concurrent` offloading to the device. --- .../fortran/do-concurrent-to-omp-saxpy-2d.f90 | 53 +++ .../fortran/do-concurrent-to-omp-saxpy.f90| 53 +++ 2 files changed, 106 insertions(+) create mode 100644 offload/test/offloading/fortran/do-concurrent-to-omp-saxpy-2d.f90 create mode 100644 offload/test/offloading/fortran/do-concurrent-to-omp-saxpy.f90 diff --git a/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy-2d.f90 b/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy-2d.f90 new file mode 100644 index 0..c6f576acb90b6 --- /dev/null +++ b/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy-2d.f90 @@ -0,0 +1,53 @@ +! REQUIRES: flang, amdgpu + +! RUN: %libomptarget-compile-fortran-generic -fdo-concurrent-to-openmp=device +! RUN: env LIBOMPTARGET_INFO=16 %libomptarget-run-generic 2>&1 | %fcheck-generic +module saxpymod + use iso_fortran_env + public :: saxpy +contains + +subroutine saxpy(a, x, y, n, m) + use iso_fortran_env + implicit none + integer,intent(in) :: n, m + real(kind=real32),intent(in) :: a + real(kind=real32), dimension(:,:),intent(in) :: x + real(kind=real32), dimension(:,:),intent(inout) :: y + integer :: i, j + + do concurrent(i=1:n, j=1:m) + y(i,j) = a * x(i,j) + y(i,j) + end do + + write(*,*) "plausibility check:" + write(*,'("y(1,1) ",f8.6)') y(1,1) + write(*,'("y(n,m) ",f8.6)') y(n,m) +end subroutine saxpy + +end module saxpymod + +program main + use iso_fortran_env + use saxpymod, ONLY:saxpy + implicit none + + integer,parameter :: n = 1000, m=1 + real(kind=real32), allocatable, dimension(:,:) :: x, y + real(kind=real32) :: a + integer :: i + + allocate(x(1:n,1:m), y(1:n,1:m)) + a = 2.0_real32 + x(:,:) = 1.0_real32 + y(:,:) = 2.0_real32 + + call saxpy(a, x, y, n, m) + + deallocate(x,y) +end program main + +! CHECK: "PluginInterface" device {{[0-9]+}} info: Launching kernel {{.*}} +! CHECK: plausibility check: +! CHECK: y(1,1) 4.0 +! CHECK: y(n,m) 4.0 diff --git a/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy.f90 b/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy.f90 new file mode 100644 index 0..e094a1d7459ef --- /dev/null +++ b/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy.f90 @@ -0,0 +1,53 @@ +! REQUIRES: flang, amdgpu + +! RUN: %libomptarget-compile-fortran-generic -fdo-concurrent-to-openmp=device +! RUN: env LIBOMPTARGET_INFO=16 %libomptarget-run-generic 2>&1 | %fcheck-generic +module saxpymod + use iso_fortran_env + public :: saxpy +contains + +subroutine saxpy(a, x, y, n) + use iso_fortran_env + implicit none + integer,intent(in) :: n + real(kind=real32),intent(in) :: a + real(kind=real32), dimension(:),intent(in) :: x + real(kind=real32), dimension(:),intent(inout) :: y + integer :: i + + do concurrent(i=1:n) + y(i) = a * x(i) + y(i) + end do + + write(*,*) "plausibility check:" + write(*,'("y(1) ",f8.6)') y(1) + write(*,'("y(n) ",f8.6)') y(n) +end subroutine saxpy + +end module saxpymod + +program main + use iso_fortran_env + use saxpymod, ONLY:saxpy + implicit none + + integer,parameter :: n = 1000 + real(kind=real32), allocatable, dimension(:) :: x, y + real(kind=real32) :: a + integer :: i + + allocate(x(1:n), y(1:n)) + a = 2.0_real32 + x(:) = 1.0_real32 + y(:) = 2.0_real32 + + call saxpy(a, x, y, n) + + deallocate(x,y) +end program main + +! CHECK: "PluginInterface" device {{[0-9]+}} info: Launching kernel {{.*}} +! CHECK: plausibility check: +! CHECK: y(1) 4.0 +! CHECK: y(n) 4.0 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [mlir] [flang][OpenMP] Support multi-block reduction combiner regions on the GPU (PR #156837)
https://github.com/ergawy updated https://github.com/llvm/llvm-project/pull/156837 >From c7d655214b726335a36eb0a9449b5d14df3699e9 Mon Sep 17 00:00:00 2001 From: ergawy Date: Thu, 4 Sep 2025 01:06:21 -0500 Subject: [PATCH] [flang][OpenMP] Support multi-block reduction combiner regions on the GPU Fixes a bug related to insertion points when inlining multi-block combiner reduction regions. The IP at the end of the inlined region was not used resulting in emitting BBs with multiple terminators. --- llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | 3 + .../omptarget-multi-block-reduction.mlir | 85 +++ 2 files changed, 88 insertions(+) create mode 100644 mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp index 3d5e487c8990f..fe00a2a5696dc 100644 --- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp +++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp @@ -3506,6 +3506,8 @@ Expected OpenMPIRBuilder::createReductionFunction( return AfterIP.takeError(); if (!Builder.GetInsertBlock()) return ReductionFunc; + + Builder.SetInsertPoint(AfterIP->getBlock(), AfterIP->getPoint()); Builder.CreateStore(Reduced, LHSPtr); } } @@ -3750,6 +3752,7 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::createReductionsGPU( RI.ReductionGen(Builder.saveIP(), RHSValue, LHSValue, Reduced); if (!AfterIP) return AfterIP.takeError(); + Builder.SetInsertPoint(AfterIP->getBlock(), AfterIP->getPoint()); Builder.CreateStore(Reduced, LHS, false); } } diff --git a/mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir b/mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir new file mode 100644 index 0..aaf06d2d0e0c2 --- /dev/null +++ b/mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir @@ -0,0 +1,85 @@ +// RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s + +// Verifies that the IR builder can handle reductions with multi-block combiner +// regions on the GPU. + +module attributes {dlti.dl_spec = #dlti.dl_spec<"dlti.alloca_memory_space" = 5 : ui64, "dlti.global_memory_space" = 1 : ui64>, llvm.target_triple = "amdgcn-amd-amdhsa", omp.is_gpu = true, omp.is_target_device = true} { + llvm.func @bar() {} + llvm.func @baz() {} + + omp.declare_reduction @add_reduction_byref_box_5xf32 : !llvm.ptr alloc { +%0 = llvm.mlir.constant(1 : i64) : i64 +%1 = llvm.alloca %0 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8, array<1 x array<3 x i64>>)> : (i64) -> !llvm.ptr<5> +%2 = llvm.addrspacecast %1 : !llvm.ptr<5> to !llvm.ptr +omp.yield(%2 : !llvm.ptr) + } init { + ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr): +omp.yield(%arg1 : !llvm.ptr) + } combiner { + ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr): +llvm.call @bar() : () -> () +llvm.br ^bb3 + + ^bb3: // pred: ^bb1 +llvm.call @baz() : () -> () +omp.yield(%arg0 : !llvm.ptr) + } + llvm.func @foo_() { +%c1 = llvm.mlir.constant(1 : i64) : i64 +%10 = llvm.alloca %c1 x !llvm.array<5 x f32> {bindc_name = "x"} : (i64) -> !llvm.ptr<5> +%11 = llvm.addrspacecast %10 : !llvm.ptr<5> to !llvm.ptr +%74 = omp.map.info var_ptr(%11 : !llvm.ptr, !llvm.array<5 x f32>) map_clauses(tofrom) capture(ByRef) -> !llvm.ptr {name = "x"} +omp.target map_entries(%74 -> %arg0 : !llvm.ptr) { + %c1_2 = llvm.mlir.constant(1 : i32) : i32 + %c10 = llvm.mlir.constant(10 : i32) : i32 + omp.teams reduction(byref @add_reduction_byref_box_5xf32 %arg0 -> %arg2 : !llvm.ptr) { +omp.parallel { + omp.distribute { +omp.wsloop { + omp.loop_nest (%arg5) : i32 = (%c1_2) to (%c10) inclusive step (%c1_2) { +omp.yield + } +} {omp.composite} + } {omp.composite} + omp.terminator +} {omp.composite} +omp.terminator + } + omp.terminator +} +llvm.return + } +} + +// CHECK: call void @__kmpc_parallel_51({{.*}}, i32 1, i32 -1, i32 -1, +// CHECK-SAME: ptr @[[PAR_OUTLINED:.*]], ptr null, ptr %2, i64 1) + +// CHECK: define internal void @[[PAR_OUTLINED]]{{.*}} { +// CHECK: .omp.reduction.then: +// CHECK: br label %omp.reduction.nonatomic.body + +// CHECK: omp.reduction.nonatomic.body: +// CHECK: call void @bar() +// CHECK: br label %[[BODY_2ND_BB:.*]] + +// CHECK: [[BODY_2ND_BB]]: +// CHECK: call void @baz() +// CHECK: br label %[[CONT_BB:.*]] + +// CHECK: [[CONT_BB]]: +// CHECK: br label %.omp.reduction.done +// CHECK: } + +// CHECK: define internal void @"{{.*}}$reduction$reduction_func"(ptr noundef %0, ptr noundef %1) #0 { +// CHECK: br label %omp.reduction.nonatomic.body + +// CHECK: [[BODY_2ND_BB:.*]]: +// CHECK: call void @baz() +// CHECK: br label %omp.region.cont + + +// CHECK: omp.reduction.nonatomic.body: +// CHECK: call void @bar()
[llvm-branch-commits] [libc++] Add build and CI support for pointer field protection (PR #152414)
@@ -411,6 +411,42 @@ bootstrapping-build) ccache -s ;; +bootstrapping-build-pfp) pcc wrote: It's required because the PFP support in the compiler is experimental, and brand new so it won't exist in compilers that are already installed on the target system. Once PFP becomes a stable feature that is supported in released compilers, we may convert this to a non-bootstrapping buid. https://github.com/llvm/llvm-project/pull/152414 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Add IR and codegen support for deactivation symbols. (PR #133536)
https://github.com/pcc updated https://github.com/llvm/llvm-project/pull/133536 >From f4c61b403c8a2c649741bae983196922143db44e Mon Sep 17 00:00:00 2001 From: Peter Collingbourne Date: Wed, 10 Sep 2025 18:02:38 -0700 Subject: [PATCH] Tweak LangRef Created using spr 1.3.6-beta.1 --- llvm/docs/LangRef.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst index 10586f03cff8e..5380413aec892 100644 --- a/llvm/docs/LangRef.rst +++ b/llvm/docs/LangRef.rst @@ -3098,7 +3098,8 @@ Deactivation Symbol Operand Bundles A ``"deactivation-symbol"`` operand bundle is valid on the following instructions (AArch64 only): -- Call to a normal function with ``notail`` attribute. +- Call to a normal function with ``notail`` attribute and a first argument and + return value of type ``ptr``. - Call to ``llvm.ptrauth.sign`` or ``llvm.ptrauth.auth`` intrinsics. This operand bundle specifies that if the deactivation symbol is defined ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Add deactivation symbol operand to ConstantPtrAuth. (PR #133537)
https://github.com/pcc updated https://github.com/llvm/llvm-project/pull/133537 >From e728f3444624a5f47f0af84c21fb3a584f3e05b7 Mon Sep 17 00:00:00 2001 From: Peter Collingbourne Date: Fri, 1 Aug 2025 17:27:41 -0700 Subject: [PATCH 1/4] Add verifier check Created using spr 1.3.6-beta.1 --- llvm/lib/IR/Verifier.cpp | 5 + llvm/test/Verifier/ptrauth-constant.ll | 6 ++ 2 files changed, 11 insertions(+) create mode 100644 llvm/test/Verifier/ptrauth-constant.ll diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp index 3ff9895e161c4..3478c2c450ae7 100644 --- a/llvm/lib/IR/Verifier.cpp +++ b/llvm/lib/IR/Verifier.cpp @@ -2627,6 +2627,11 @@ void Verifier::visitConstantPtrAuth(const ConstantPtrAuth *CPA) { Check(CPA->getDiscriminator()->getBitWidth() == 64, "signed ptrauth constant discriminator must be i64 constant integer"); + + Check(isa(CPA->getDeactivationSymbol()) || +CPA->getDeactivationSymbol()->isNullValue(), +"signed ptrauth constant deactivation symbol must be a global value " +"or null"); } bool Verifier::verifyAttributeCount(AttributeList Attrs, unsigned Params) { diff --git a/llvm/test/Verifier/ptrauth-constant.ll b/llvm/test/Verifier/ptrauth-constant.ll new file mode 100644 index 0..fdd6352cf8469 --- /dev/null +++ b/llvm/test/Verifier/ptrauth-constant.ll @@ -0,0 +1,6 @@ +; RUN: not opt -passes=verify < %s 2>&1 | FileCheck %s + +@g = external global i8 + +; CHECK: signed ptrauth constant deactivation symbol must be a global variable or null +@ptr = global ptr ptrauth (ptr @g, i32 0, i64 65535, ptr null, ptr inttoptr (i64 16 to ptr)) >From 60e836e71bf9aabe9dade2bda1ca38107f76b599 Mon Sep 17 00:00:00 2001 From: Peter Collingbourne Date: Mon, 8 Sep 2025 17:34:59 -0700 Subject: [PATCH 2/4] Address review comment Created using spr 1.3.6-beta.1 --- llvm/lib/IR/Constants.cpp | 1 + llvm/test/Assembler/invalid-ptrauth-const6.ll | 6 ++ 2 files changed, 7 insertions(+) create mode 100644 llvm/test/Assembler/invalid-ptrauth-const6.ll diff --git a/llvm/lib/IR/Constants.cpp b/llvm/lib/IR/Constants.cpp index 5eacc7af1269b..53b292f90c03d 100644 --- a/llvm/lib/IR/Constants.cpp +++ b/llvm/lib/IR/Constants.cpp @@ -2082,6 +2082,7 @@ ConstantPtrAuth::ConstantPtrAuth(Constant *Ptr, ConstantInt *Key, assert(Key->getBitWidth() == 32); assert(Disc->getBitWidth() == 64); assert(AddrDisc->getType()->isPointerTy()); + assert(DeactivationSymbol->getType()->isPointerTy()); setOperand(0, Ptr); setOperand(1, Key); setOperand(2, Disc); diff --git a/llvm/test/Assembler/invalid-ptrauth-const6.ll b/llvm/test/Assembler/invalid-ptrauth-const6.ll new file mode 100644 index 0..6e8e1d386acc8 --- /dev/null +++ b/llvm/test/Assembler/invalid-ptrauth-const6.ll @@ -0,0 +1,6 @@ +; RUN: not llvm-as < %s 2>&1 | FileCheck %s + +@var = global i32 0 + +; CHECK: error: constant ptrauth deactivation symbol must be a pointer +@ptr = global ptr ptrauth (ptr @var, i32 0, i64 65535, ptr null, i64 0) >From a780d181fa69236d5909759a24a1134b50313980 Mon Sep 17 00:00:00 2001 From: Peter Collingbourne Date: Tue, 9 Sep 2025 17:18:49 -0700 Subject: [PATCH 3/4] Address review comment Created using spr 1.3.6-beta.1 --- llvm/lib/Bitcode/Reader/BitcodeReader.cpp | 3 +++ llvm/lib/IR/Verifier.cpp | 3 +++ 2 files changed, 6 insertions(+) diff --git a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp index 045ed204620fb..04fe4c57af6ed 100644 --- a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp +++ b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp @@ -1613,6 +1613,9 @@ Expected BitcodeReader::materializeValue(unsigned StartValID, ConstOps.size() > 4 ? ConstOps[4] : ConstantPointerNull::get(cast( ConstOps[3]->getType())); + if (DeactivationSymbol->getType()->isPointerTy()) +return error( +"ptrauth deactivation symbol operand must be a pointer"); C = ConstantPtrAuth::get(ConstOps[0], Key, Disc, ConstOps[3], DeactivationSymbol); diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp index 9e44dfb387615..a53ba17e26011 100644 --- a/llvm/lib/IR/Verifier.cpp +++ b/llvm/lib/IR/Verifier.cpp @@ -2632,6 +2632,9 @@ void Verifier::visitConstantPtrAuth(const ConstantPtrAuth *CPA) { Check(CPA->getDiscriminator()->getBitWidth() == 64, "signed ptrauth constant discriminator must be i64 constant integer"); + Check(CPA->getDeactivationSymbol()->getType()->isPointerTy(), +"signed ptrauth constant deactivation symbol must be a pointer"); + Check(isa(CPA->getDeactivationSymbol()) || CPA->getDeactivationSymbol()->isNullValue(), "signed ptrauth constant deactivation symbol must be a global value " >From 51c353bbde24f940e3dfd7488aec0682dbef260b Mon Se
[llvm-branch-commits] Add llvm.protected.field.ptr intrinsic and pre-ISel lowering. (PR #151647)
https://github.com/pcc updated https://github.com/llvm/llvm-project/pull/151647 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] `do concurrent`: support `local` on device (PR #157638)
https://github.com/ergawy updated https://github.com/llvm/llvm-project/pull/157638 >From 723193bcd43fc0be3e3e18b95e35d2ac8226aa18 Mon Sep 17 00:00:00 2001 From: ergawy Date: Tue, 2 Sep 2025 05:54:00 -0500 Subject: [PATCH] [flang][OpenMP] `do concurrent`: support `local` on device Extends support for mapping `do concurrent` on the device by adding support for `local` specifiers. The changes in this PR map the local variable to the `omp.target` op and uses the mapped value as the `private` clause operand in the nested `omp.parallel` op. --- .../include/flang/Optimizer/Dialect/FIROps.td | 12 ++ .../OpenMP/DoConcurrentConversion.cpp | 192 +++--- .../Transforms/DoConcurrent/local_device.mlir | 49 + 3 files changed, 175 insertions(+), 78 deletions(-) create mode 100644 flang/test/Transforms/DoConcurrent/local_device.mlir diff --git a/flang/include/flang/Optimizer/Dialect/FIROps.td b/flang/include/flang/Optimizer/Dialect/FIROps.td index bc971e8fd6600..fc6eedc6ed4c6 100644 --- a/flang/include/flang/Optimizer/Dialect/FIROps.td +++ b/flang/include/flang/Optimizer/Dialect/FIROps.td @@ -3894,6 +3894,18 @@ def fir_DoConcurrentLoopOp : fir_Op<"do_concurrent.loop", return getReduceVars().size(); } +unsigned getInductionVarsStart() { + return 0; +} + +unsigned getLocalOperandsStart() { + return getNumInductionVars(); +} + +unsigned getReduceOperandsStart() { + return getLocalOperandsStart() + getNumLocalOperands(); +} + mlir::Block::BlockArgListType getInductionVars() { return getBody()->getArguments().slice(0, getNumInductionVars()); } diff --git a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp index 6c71924000842..d00a4fdd2cf2e 100644 --- a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp +++ b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp @@ -138,6 +138,9 @@ void collectLoopLiveIns(fir::DoConcurrentLoopOp loop, liveIns.push_back(operand->get()); }); + + for (mlir::Value local : loop.getLocalVars()) +liveIns.push_back(local); } /// Collects values that are local to a loop: "loop-local values". A loop-local @@ -298,8 +301,7 @@ class DoConcurrentConversion .getIsTargetDevice(); mlir::omp::TargetOperands targetClauseOps; - genLoopNestClauseOps(doLoop.getLoc(), rewriter, loop, mapper, - loopNestClauseOps, + genLoopNestClauseOps(doLoop.getLoc(), rewriter, loop, loopNestClauseOps, isTargetDevice ? nullptr : &targetClauseOps); LiveInShapeInfoMap liveInShapeInfoMap; @@ -321,14 +323,13 @@ class DoConcurrentConversion } mlir::omp::ParallelOp parallelOp = -genParallelOp(doLoop.getLoc(), rewriter, ivInfos, mapper); +genParallelOp(rewriter, loop, ivInfos, mapper); // Only set as composite when part of `distribute parallel do`. parallelOp.setComposite(mapToDevice); if (!mapToDevice) - genLoopNestClauseOps(doLoop.getLoc(), rewriter, loop, mapper, - loopNestClauseOps); + genLoopNestClauseOps(doLoop.getLoc(), rewriter, loop, loopNestClauseOps); for (mlir::Value local : locals) looputils::localizeLoopLocalValue(local, parallelOp.getRegion(), @@ -337,10 +338,38 @@ class DoConcurrentConversion if (mapToDevice) genDistributeOp(doLoop.getLoc(), rewriter).setComposite(/*val=*/true); -mlir::omp::LoopNestOp ompLoopNest = +auto [loopNestOp, wsLoopOp] = genWsLoopOp(rewriter, loop, mapper, loopNestClauseOps, /*isComposite=*/mapToDevice); +// `local` region arguments are transferred/cloned from the `do concurrent` +// loop to the loopnest op when the region is cloned above. Instead, these +// region arguments should be on the workshare loop's region. +if (mapToDevice) { + for (auto [parallelArg, loopNestArg] : llvm::zip_equal( + parallelOp.getRegion().getArguments(), + loopNestOp.getRegion().getArguments().slice( + loop.getLocalOperandsStart(), loop.getNumLocalOperands( +rewriter.replaceAllUsesWith(loopNestArg, parallelArg); + + for (auto [wsloopArg, loopNestArg] : llvm::zip_equal( + wsLoopOp.getRegion().getArguments(), + loopNestOp.getRegion().getArguments().slice( + loop.getReduceOperandsStart(), loop.getNumReduceOperands( +rewriter.replaceAllUsesWith(loopNestArg, wsloopArg); +} else { + for (auto [wsloopArg, loopNestArg] : + llvm::zip_equal(wsLoopOp.getRegion().getArguments(), + loopNestOp.getRegion().getArguments().drop_front( + loopNestClauseOps.loopLowerBounds.size( +rewriter.replaceAllUsesWith(loopNestArg, wsloopArg); +} + +for (unsigned i = 0; + i
[llvm-branch-commits] [libcxx] [libc++] Add ABI flag to make __tree nodes more compact (PR #147681)
https://github.com/ldionne commented: LGTM but let's A/B measure this to see whether there is a visible impact. I'm especially looking for a regression caused by more expensive pointer chasing since we have to "decode" the pointer now. If we don't see issues with this, I think I'd be OK with making this the new "de facto" ABI for v2 unconditionally. Also, this obviously needs `pointer_int_pair` to land. https://github.com/llvm/llvm-project/pull/147681 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AllocToken, Clang] Implement __builtin_alloc_token_infer() and llvm.alloc.token.id (PR #156842)
@@ -1274,6 +1274,12 @@ def AllocaWithAlignUninitialized : Builtin { let Prototype = "void*(size_t, _Constant size_t)"; } +def AllocTokenInfer : Builtin { + let Spellings = ["__builtin_alloc_token_infer"]; ojhunt wrote: I think `__builtin_infer_alloc_token` sounds better? I can't think of a way to easily infer from returns :-/ A developer can work with out parameters though - at least in the macro use case `infer(*out_param_expr)` https://github.com/llvm/llvm-project/pull/156842 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang-tools-extra] [compiler-rt] [libcxx] [libcxxabi] [libunwind] [lldb] [llvm] [mlir] [openmp] release/21.x: [CMake][AIX] quote the string AIX `if` conditions (PR #1565
daltenty wrote: > Uhm - this looks pretty big and seems like something that can easily break > certain build configurations since it doesn't seem to touch only AIX Agreed that this looks big and scary, but it's a purely mechanical change, that is a no-op for most targets. I'll add a long form rational at the end of the comment about why I don't think the patch effects anyone but AIX to keep my answers brief. >Is this in main without any issues? Yes, these patches have been in main for several weeks at this point with no reported issues. > Does it really NEED to be merged for the release branch at this point? It would help us out for the point releases. Without this patch, we're unable to build on AIX with CMake from our package manager (4.0). We can manually downgrade if we're unwilling **Rationale about why the patch doesn't affect targets besides AIX** We quote the string AIX and variable expansions which might expand to string AIX (i.e. `CMAKE_SYSTEM_NAME`), so that we do the intent string comparison. If not quoted the if will expand the string if it happens to match a variable name (which `AIX` does in CMake 4.0+). This has an effect only if `CMAKE_SYSTEM_NAME` (https://cmake.org/cmake/help/latest/variable/CMAKE_SYSTEM_NAME.html) expands to something which is a CMake variable (https://cmake.org/cmake/help/latest/manual/cmake-variables.7.html#variables-that-describe-the-system) Intersecting the two list gives me the following list of affect targets: ``` AIX CYGWIN MSYS WASI ``` Of those targets, only CYGWIN appears in the lines affected by the patch, and it's already using a variable check (i.e. it checks `CYGWIN`) not a string comparison to `CMAKE_SYSTEM_NAME`, so it's unaffected. https://github.com/llvm/llvm-project/pull/156505 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] Prepare libcxx and libcxxabi for pointer field protection. (PR #151651)
@@ -256,9 +256,12 @@ void unique_ptr_test() { ComparePrettyPrintToRegex(std::move(forty_two), R"(std::unique_ptr containing = {__ptr_ = 0x[a-f0-9]+})"); +#if !defined(__POINTER_FIELD_PROTECTION__) + // GDB doesn't know how to read PFP fields correctly yet. pcc wrote: The support for this feature in GCC is independent of support in GDB. We could imagine debug info extensions being developed in the future to make it possible for this to pass in GDB even without GCC support. That being said, disabling the test is also fine with me. https://github.com/llvm/llvm-project/pull/151651 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] Prepare libcxx and libcxxabi for pointer field protection. (PR #151651)
@@ -256,9 +256,12 @@ void unique_ptr_test() { ComparePrettyPrintToRegex(std::move(forty_two), R"(std::unique_ptr containing = {__ptr_ = 0x[a-f0-9]+})"); +#if !defined(__POINTER_FIELD_PROTECTION__) + // GDB doesn't know how to read PFP fields correctly yet. philnik777 wrote: Does GCC have pfp in general? If not, IMO we should just disable the pretty printer test with pfp enabled. https://github.com/llvm/llvm-project/pull/151651 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [MC] Rewrite stdin.s to use python (PR #157232)
https://github.com/boomanaiden154 updated https://github.com/llvm/llvm-project/pull/157232 >From d749f30964e57caa797b3df87ae88ffc3d4a2f54 Mon Sep 17 00:00:00 2001 From: Aiden Grossman Date: Sun, 7 Sep 2025 17:39:19 + Subject: [PATCH 1/3] feedback Created using spr 1.3.6 --- llvm/test/MC/COFF/stdin.py | 17 + llvm/test/MC/COFF/stdin.s | 1 - 2 files changed, 17 insertions(+), 1 deletion(-) create mode 100644 llvm/test/MC/COFF/stdin.py delete mode 100644 llvm/test/MC/COFF/stdin.s diff --git a/llvm/test/MC/COFF/stdin.py b/llvm/test/MC/COFF/stdin.py new file mode 100644 index 0..8b7b6ae1fba13 --- /dev/null +++ b/llvm/test/MC/COFF/stdin.py @@ -0,0 +1,17 @@ +# RUN: echo "// comment" > %t.input +# RUN: which llvm-mc | %python %s %t + +import subprocess +import sys + +llvm_mc_binary = sys.stdin.readlines()[0].strip() +temp_file = sys.argv[1] +input_file = temp_file + ".input" + +with open(temp_file, "w") as mc_stdout: +mc_stdout.seek(4) +subprocess.run( +[llvm_mc_binary, "-filetype=obj", "-triple", "i686-pc-win32", input_file], +stdout=mc_stdout, +check=True, +) diff --git a/llvm/test/MC/COFF/stdin.s b/llvm/test/MC/COFF/stdin.s deleted file mode 100644 index 8ceae7fdef501..0 --- a/llvm/test/MC/COFF/stdin.s +++ /dev/null @@ -1 +0,0 @@ -// RUN: bash -c '(echo "test"; llvm-mc -filetype=obj -triple i686-pc-win32 %s ) > %t' >From 0bfe954d4cd5edf4312e924c278c59e57644d5f1 Mon Sep 17 00:00:00 2001 From: Aiden Grossman Date: Mon, 8 Sep 2025 17:28:59 + Subject: [PATCH 2/3] feedback Created using spr 1.3.6 --- llvm/test/MC/COFF/stdin.py | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/llvm/test/MC/COFF/stdin.py b/llvm/test/MC/COFF/stdin.py index 8b7b6ae1fba13..1d9b50c022523 100644 --- a/llvm/test/MC/COFF/stdin.py +++ b/llvm/test/MC/COFF/stdin.py @@ -1,14 +1,22 @@ # RUN: echo "// comment" > %t.input # RUN: which llvm-mc | %python %s %t +import argparse import subprocess import sys +parser = argparse.ArgumentParser() +parser.add_argument("temp_file") +arguments = parser.parse_args() + llvm_mc_binary = sys.stdin.readlines()[0].strip() -temp_file = sys.argv[1] +temp_file = arguments.temp_file input_file = temp_file + ".input" with open(temp_file, "w") as mc_stdout: +## We need to test that starting on an input stream with a non-zero offset +## does not trigger an assertion in WinCOFFObjectWriter.cpp, so we seek +## past zero for STDOUT. mc_stdout.seek(4) subprocess.run( [llvm_mc_binary, "-filetype=obj", "-triple", "i686-pc-win32", input_file], >From 2ae17e4f18a95c52b53ad5ad45a19c4bf29e5025 Mon Sep 17 00:00:00 2001 From: Aiden Grossman Date: Mon, 8 Sep 2025 17:43:39 + Subject: [PATCH 3/3] feedback Created using spr 1.3.6 --- llvm/test/MC/COFF/stdin.py | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/llvm/test/MC/COFF/stdin.py b/llvm/test/MC/COFF/stdin.py index 1d9b50c022523..0da1b4895142b 100644 --- a/llvm/test/MC/COFF/stdin.py +++ b/llvm/test/MC/COFF/stdin.py @@ -1,25 +1,30 @@ # RUN: echo "// comment" > %t.input -# RUN: which llvm-mc | %python %s %t +# RUN: which llvm-mc | %python %s %t.input %t import argparse import subprocess import sys parser = argparse.ArgumentParser() +parser.add_argument("input_file") parser.add_argument("temp_file") arguments = parser.parse_args() llvm_mc_binary = sys.stdin.readlines()[0].strip() -temp_file = arguments.temp_file -input_file = temp_file + ".input" -with open(temp_file, "w") as mc_stdout: +with open(arguments.temp_file, "w") as mc_stdout: ## We need to test that starting on an input stream with a non-zero offset ## does not trigger an assertion in WinCOFFObjectWriter.cpp, so we seek ## past zero for STDOUT. mc_stdout.seek(4) subprocess.run( -[llvm_mc_binary, "-filetype=obj", "-triple", "i686-pc-win32", input_file], +[ +llvm_mc_binary, +"-filetype=obj", +"-triple", +"i686-pc-win32", +arguments.input_file, +], stdout=mc_stdout, check=True, ) ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/21.x: [VPlan] Don't narrow op multiple times in narrowInterle… (PR #158013)
https://github.com/fhahn created https://github.com/llvm/llvm-project/pull/158013 …aveGroups. Track which ops already have been narrowed, to avoid narrowing the same operation multiple times. Repeated narrowing will lead to incorrect results, because we could first narrow from an interleave group -> wide load, and then narrow the wide load > single-scalar load. Fixes thttps://github.com/llvm/llvm-project/issues/156190. >From 93505953fea754e6bbb1edb5fca75097132377b5 Mon Sep 17 00:00:00 2001 From: Florian Hahn Date: Wed, 10 Sep 2025 17:09:49 +0100 Subject: [PATCH] release/21.x: [VPlan] Don't narrow op multiple times in narrowInterleaveGroups. Track which ops already have been narrowed, to avoid narrowing the same operation multiple times. Repeated narrowing will lead to incorrect results, because we could first narrow from an interleave group -> wide load, and then narrow the wide load > single-scalar load. Fixes thttps://github.com/llvm/llvm-project/issues/156190. --- .../Transforms/Vectorize/VPlanTransforms.cpp | 8 +- ...nterleave-to-widen-memory-with-wide-ops.ll | 79 +++ ...sform-narrow-interleave-to-widen-memory.ll | 73 + 3 files changed, 158 insertions(+), 2 deletions(-) diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp index 6a3b3e6e41955..f7c1c10185c68 100644 --- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp +++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp @@ -3252,9 +3252,10 @@ void VPlanTransforms::narrowInterleaveGroups(VPlan &Plan, ElementCount VF, return; // Convert InterleaveGroup \p R to a single VPWidenLoadRecipe. - auto NarrowOp = [](VPValue *V) -> VPValue * { + SmallPtrSet NarrowedOps; + auto NarrowOp = [&NarrowedOps](VPValue *V) -> VPValue * { auto *R = V->getDefiningRecipe(); -if (!R) +if (!R || NarrowedOps.contains(V)) return V; if (auto *LoadGroup = dyn_cast(R)) { // Narrow interleave group to wide load, as transformed VPlan will only @@ -3264,6 +3265,7 @@ void VPlanTransforms::narrowInterleaveGroups(VPlan &Plan, ElementCount VF, LoadGroup->getAddr(), LoadGroup->getMask(), /*Consecutive=*/true, /*Reverse=*/false, {}, LoadGroup->getDebugLoc()); L->insertBefore(LoadGroup); + NarrowedOps.insert(L); return L; } @@ -3271,6 +3273,7 @@ void VPlanTransforms::narrowInterleaveGroups(VPlan &Plan, ElementCount VF, assert(RepR->isSingleScalar() && isa(RepR->getUnderlyingInstr()) && "must be a single scalar load"); + NarrowedOps.insert(RepR); return RepR; } auto *WideLoad = cast(R); @@ -3281,6 +3284,7 @@ void VPlanTransforms::narrowInterleaveGroups(VPlan &Plan, ElementCount VF, WideLoad->operands(), /*IsUniform*/ true, /*Mask*/ nullptr, *WideLoad); N->insertBefore(WideLoad); +NarrowedOps.insert(N); return N; }; diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-with-wide-ops.ll b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-with-wide-ops.ll index 813d61b52100f..aec6c0be6dde2 100644 --- a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-with-wide-ops.ll +++ b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-with-wide-ops.ll @@ -1203,3 +1203,82 @@ loop: exit: ret void } + +; Make sure multiple uses of a narrowed op are handled correctly, +; https://github.com/llvm/llvm-project/issues/156190. +define void @multiple_store_groups_storing_same_wide_bin_op(ptr noalias %A, ptr noalias %B, ptr noalias %C) { +; VF2-LABEL: define void @multiple_store_groups_storing_same_wide_bin_op( +; VF2-SAME: ptr noalias [[A:%.*]], ptr noalias [[B:%.*]], ptr noalias [[C:%.*]]) { +; VF2-NEXT: [[ENTRY:.*:]] +; VF2-NEXT:br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]] +; VF2: [[VECTOR_PH]]: +; VF2-NEXT:br label %[[VECTOR_BODY:.*]] +; VF2: [[VECTOR_BODY]]: +; VF2-NEXT:[[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ] +; VF2-NEXT:[[TMP0:%.*]] = getelementptr { double, double }, ptr [[A]], i64 [[INDEX]] +; VF2-NEXT:[[BROADCAST_SPLAT:%.*]] = load <2 x double>, ptr [[TMP0]], align 8 +; VF2-NEXT:[[TMP2:%.*]] = fadd contract <2 x double> [[BROADCAST_SPLAT]], splat (double 2.00e+01) +; VF2-NEXT:[[TMP3:%.*]] = getelementptr { double, double }, ptr [[B]], i64 [[INDEX]] +; VF2-NEXT:store <2 x double> [[TMP2]], ptr [[TMP3]], align 8 +; VF2-NEXT:[[TMP4:%.*]] = getelementptr { double, double }, ptr [[C]], i64 [[INDEX]] +; VF2-NEXT:store <2 x double> [[TMP2]], ptr [[TMP4]], align 8 +; VF2-NEXT:[[INDEX_NEXT]] = add nuw i64 [[INDEX]], 1 +; VF2-NEXT:[[TMP5:%.*]] = icmp eq i64 [[IN
[llvm-branch-commits] Prepare libcxx and libcxxabi for pointer field protection. (PR #151651)
@@ -26,6 +26,12 @@ # include #endif +#if defined(__POINTER_FIELD_PROTECTION__) +constexpr bool pfp_disabled = false; +#else +constexpr bool pfp_disabled = true; +#endif pcc wrote: That's fine with me I suppose. The correct result for `__libcpp_is_trivially_relocatable` is implicitly tested by the other tests (which would crash if it was wrong). https://github.com/llvm/llvm-project/pull/151651 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/21.x: [VPlan] Don't narrow op multiple times in narrowInterleaveGroups. (PR #158013)
https://github.com/fhahn edited https://github.com/llvm/llvm-project/pull/158013 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/21.x: [VPlan] Don't narrow op multiple times in narrowInterleaveGroups. (PR #158013)
https://github.com/fhahn milestoned https://github.com/llvm/llvm-project/pull/158013 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/21.x: [VPlan] Don't narrow op multiple times in narrowInterleaveGroups. (PR #158013)
llvmbot wrote: @llvm/pr-subscribers-vectorizers @llvm/pr-subscribers-llvm-transforms Author: Florian Hahn (fhahn) Changes Track which ops already have been narrowed, to avoid narrowing the same operation multiple times. Repeated narrowing will lead to incorrect results, because we could first narrow from an interleave group -> wide load, and then narrow the wide load > single-scalar load. Fixes thttps://github.com/llvm/llvm-project/issues/156190. --- Full diff: https://github.com/llvm/llvm-project/pull/158013.diff 3 Files Affected: - (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+6-2) - (modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-with-wide-ops.ll (+79) - (modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory.ll (+73) ``diff diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp index 6a3b3e6e41955..f7c1c10185c68 100644 --- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp +++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp @@ -3252,9 +3252,10 @@ void VPlanTransforms::narrowInterleaveGroups(VPlan &Plan, ElementCount VF, return; // Convert InterleaveGroup \p R to a single VPWidenLoadRecipe. - auto NarrowOp = [](VPValue *V) -> VPValue * { + SmallPtrSet NarrowedOps; + auto NarrowOp = [&NarrowedOps](VPValue *V) -> VPValue * { auto *R = V->getDefiningRecipe(); -if (!R) +if (!R || NarrowedOps.contains(V)) return V; if (auto *LoadGroup = dyn_cast(R)) { // Narrow interleave group to wide load, as transformed VPlan will only @@ -3264,6 +3265,7 @@ void VPlanTransforms::narrowInterleaveGroups(VPlan &Plan, ElementCount VF, LoadGroup->getAddr(), LoadGroup->getMask(), /*Consecutive=*/true, /*Reverse=*/false, {}, LoadGroup->getDebugLoc()); L->insertBefore(LoadGroup); + NarrowedOps.insert(L); return L; } @@ -3271,6 +3273,7 @@ void VPlanTransforms::narrowInterleaveGroups(VPlan &Plan, ElementCount VF, assert(RepR->isSingleScalar() && isa(RepR->getUnderlyingInstr()) && "must be a single scalar load"); + NarrowedOps.insert(RepR); return RepR; } auto *WideLoad = cast(R); @@ -3281,6 +3284,7 @@ void VPlanTransforms::narrowInterleaveGroups(VPlan &Plan, ElementCount VF, WideLoad->operands(), /*IsUniform*/ true, /*Mask*/ nullptr, *WideLoad); N->insertBefore(WideLoad); +NarrowedOps.insert(N); return N; }; diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-with-wide-ops.ll b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-with-wide-ops.ll index 813d61b52100f..aec6c0be6dde2 100644 --- a/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-with-wide-ops.ll +++ b/llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-with-wide-ops.ll @@ -1203,3 +1203,82 @@ loop: exit: ret void } + +; Make sure multiple uses of a narrowed op are handled correctly, +; https://github.com/llvm/llvm-project/issues/156190. +define void @multiple_store_groups_storing_same_wide_bin_op(ptr noalias %A, ptr noalias %B, ptr noalias %C) { +; VF2-LABEL: define void @multiple_store_groups_storing_same_wide_bin_op( +; VF2-SAME: ptr noalias [[A:%.*]], ptr noalias [[B:%.*]], ptr noalias [[C:%.*]]) { +; VF2-NEXT: [[ENTRY:.*:]] +; VF2-NEXT:br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]] +; VF2: [[VECTOR_PH]]: +; VF2-NEXT:br label %[[VECTOR_BODY:.*]] +; VF2: [[VECTOR_BODY]]: +; VF2-NEXT:[[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ] +; VF2-NEXT:[[TMP0:%.*]] = getelementptr { double, double }, ptr [[A]], i64 [[INDEX]] +; VF2-NEXT:[[BROADCAST_SPLAT:%.*]] = load <2 x double>, ptr [[TMP0]], align 8 +; VF2-NEXT:[[TMP2:%.*]] = fadd contract <2 x double> [[BROADCAST_SPLAT]], splat (double 2.00e+01) +; VF2-NEXT:[[TMP3:%.*]] = getelementptr { double, double }, ptr [[B]], i64 [[INDEX]] +; VF2-NEXT:store <2 x double> [[TMP2]], ptr [[TMP3]], align 8 +; VF2-NEXT:[[TMP4:%.*]] = getelementptr { double, double }, ptr [[C]], i64 [[INDEX]] +; VF2-NEXT:store <2 x double> [[TMP2]], ptr [[TMP4]], align 8 +; VF2-NEXT:[[INDEX_NEXT]] = add nuw i64 [[INDEX]], 1 +; VF2-NEXT:[[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000 +; VF2-NEXT:br i1 [[TMP5]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP30:![0-9]+]] +; VF2: [[MIDDLE_BLOCK]]: +; VF2-NEXT:br i1 true, [[EXIT:label %.*]], label %[[SCALAR_PH]] +; VF2: [[SCALAR_PH]]: +; +; VF4-LABEL: define void @multiple_store_groups_storing_same_wide_bin_op( +; VF4-SAME: ptr noalias
[llvm-branch-commits] [llvm] release/21.x: [VPlan] Don't narrow op multiple times in narrowInterleaveGroups. (PR #158013)
https://github.com/fhahn edited https://github.com/llvm/llvm-project/pull/158013 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] Prepare libcxx and libcxxabi for pointer field protection. (PR #151651)
@@ -484,8 +484,21 @@ typedef __char32_t char32_t; #define _LIBCPP_EXCEPTIONS_SIG e # endif +# if !_LIBCPP_HAS_EXCEPTIONS +#define _LIBCPP_EXCEPTIONS_SIG n +# else +#define _LIBCPP_EXCEPTIONS_SIG e +# endif + +# if defined(__POINTER_FIELD_PROTECTION__) +#define _LIBCPP_PFP_SIG p +# else +#define _LIBCPP_PFP_SIG +# endif pcc wrote: Yes, the in-memory pointer format changes so it's effectively a layout (ABI) change. Therefore we need an ABI tag change to detect/prevent linking against mismatching ABIs. This was requested by @mordante in #133538. https://github.com/llvm/llvm-project/pull/151651 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] 1e192c0 - Revert "[MLIR] Remove CopyOpInterface (#157711)"
Author: Mehdi Amini Date: 2025-09-11T10:25:36+01:00 New Revision: 1e192c006bf978fad12dbc4bba8c6213b6b9c907 URL: https://github.com/llvm/llvm-project/commit/1e192c006bf978fad12dbc4bba8c6213b6b9c907 DIFF: https://github.com/llvm/llvm-project/commit/1e192c006bf978fad12dbc4bba8c6213b6b9c907.diff LOG: Revert "[MLIR] Remove CopyOpInterface (#157711)" This reverts commit 63647074ba97dc606c7ba48c3800ec08ca501d92. Added: mlir/include/mlir/Interfaces/CopyOpInterface.h mlir/include/mlir/Interfaces/CopyOpInterface.td mlir/lib/Interfaces/CopyOpInterface.cpp Modified: mlir/include/mlir/Dialect/Bufferization/IR/Bufferization.h mlir/include/mlir/Dialect/Bufferization/IR/BufferizationOps.td mlir/include/mlir/Dialect/Linalg/IR/Linalg.h mlir/include/mlir/Dialect/MemRef/IR/MemRef.h mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td mlir/include/mlir/Interfaces/CMakeLists.txt mlir/lib/Interfaces/CMakeLists.txt mlir/test/lib/Dialect/Test/TestDialect.h mlir/test/lib/Dialect/Test/TestOps.h mlir/test/lib/Dialect/Test/TestOps.td Removed: diff --git a/mlir/include/mlir/Dialect/Bufferization/IR/Bufferization.h b/mlir/include/mlir/Dialect/Bufferization/IR/Bufferization.h index e735651d5366d..1ef5370802953 100644 --- a/mlir/include/mlir/Dialect/Bufferization/IR/Bufferization.h +++ b/mlir/include/mlir/Dialect/Bufferization/IR/Bufferization.h @@ -12,6 +12,7 @@ #include "mlir/Bytecode/BytecodeOpInterface.h" #include "mlir/Dialect/Bufferization/IR/AllocationOpInterface.h" #include "mlir/Dialect/Bufferization/IR/BufferizableOpInterface.h" +#include "mlir/Interfaces/CopyOpInterface.h" #include "mlir/Interfaces/DestinationStyleOpInterface.h" #include "mlir/Interfaces/InferTypeOpInterface.h" #include "mlir/Interfaces/SubsetOpInterface.h" diff --git a/mlir/include/mlir/Dialect/Bufferization/IR/BufferizationOps.td b/mlir/include/mlir/Dialect/Bufferization/IR/BufferizationOps.td index 6724d4c483101..271b42025e0af 100644 --- a/mlir/include/mlir/Dialect/Bufferization/IR/BufferizationOps.td +++ b/mlir/include/mlir/Dialect/Bufferization/IR/BufferizationOps.td @@ -18,6 +18,7 @@ include "mlir/Interfaces/DestinationStyleOpInterface.td" include "mlir/Interfaces/InferTypeOpInterface.td" include "mlir/Interfaces/SideEffectInterfaces.td" include "mlir/Interfaces/SubsetOpInterface.td" +include "mlir/Interfaces/CopyOpInterface.td" class Bufferization_Op traits = []> : Op; @@ -170,6 +171,7 @@ def Bufferization_AllocTensorOp : Bufferization_Op<"alloc_tensor", //===--===// def Bufferization_CloneOp : Bufferization_Op<"clone", [ +CopyOpInterface, MemoryEffectsOpInterface, DeclareOpInterfaceMethods ]> { diff --git a/mlir/include/mlir/Dialect/Linalg/IR/Linalg.h b/mlir/include/mlir/Dialect/Linalg/IR/Linalg.h index 9de6d8fd50983..eb4e3810f0d07 100644 --- a/mlir/include/mlir/Dialect/Linalg/IR/Linalg.h +++ b/mlir/include/mlir/Dialect/Linalg/IR/Linalg.h @@ -22,6 +22,7 @@ #include "mlir/IR/ImplicitLocOpBuilder.h" #include "mlir/IR/TypeUtilities.h" #include "mlir/Interfaces/ControlFlowInterfaces.h" +#include "mlir/Interfaces/CopyOpInterface.h" #include "mlir/Interfaces/DestinationStyleOpInterface.h" #include "mlir/Interfaces/InferTypeOpInterface.h" #include "mlir/Interfaces/SideEffectInterfaces.h" diff --git a/mlir/include/mlir/Dialect/MemRef/IR/MemRef.h b/mlir/include/mlir/Dialect/MemRef/IR/MemRef.h index bdec699eb4ce4..ac383ab46e7a5 100644 --- a/mlir/include/mlir/Dialect/MemRef/IR/MemRef.h +++ b/mlir/include/mlir/Dialect/MemRef/IR/MemRef.h @@ -16,6 +16,7 @@ #include "mlir/Interfaces/CallInterfaces.h" #include "mlir/Interfaces/CastInterfaces.h" #include "mlir/Interfaces/ControlFlowInterfaces.h" +#include "mlir/Interfaces/CopyOpInterface.h" #include "mlir/Interfaces/InferIntRangeInterface.h" #include "mlir/Interfaces/InferTypeOpInterface.h" #include "mlir/Interfaces/MemorySlotInterfaces.h" diff --git a/mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td b/mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td index 513a9a18198a3..d6b7a97179b71 100644 --- a/mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td +++ b/mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td @@ -13,6 +13,7 @@ include "mlir/Dialect/Arith/IR/ArithBase.td" include "mlir/Dialect/MemRef/IR/MemRefBase.td" include "mlir/Interfaces/CastInterfaces.td" include "mlir/Interfaces/ControlFlowInterfaces.td" +include "mlir/Interfaces/CopyOpInterface.td" include "mlir/Interfaces/InferIntRangeInterface.td" include "mlir/Interfaces/InferTypeOpInterface.td" include "mlir/Interfaces/MemorySlotInterfaces.td" @@ -529,7 +530,7 @@ def MemRef_CastOp : MemRef_Op<"cast", [ // CopyOp //===--===// -def CopyOp : MemRef_Op<"copy", [SameOperandsElementType, +def CopyOp : MemRef_Op<"copy", [
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in various special cases (PR #145330)
https://github.com/ritter-x2a updated https://github.com/llvm/llvm-project/pull/145330 >From 41b0c715809685ab360559cf47af2fa3ddb8f036 Mon Sep 17 00:00:00 2001 From: Fabian Ritter Date: Tue, 17 Jun 2025 04:03:53 -0400 Subject: [PATCH 1/2] [AMDGPU][SDAG] Handle ISD::PTRADD in various special cases There are more places in SIISelLowering.cpp and AMDGPUISelDAGToDAG.cpp that check for ISD::ADD in a pointer context, but as far as I can tell those are only relevant for 32-bit pointer arithmetic (like frame indices/scratch addresses and LDS), for which we don't enable PTRADD generation yet. For SWDEV-516125. --- .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 2 +- .../CodeGen/SelectionDAG/TargetLowering.cpp | 21 +- llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp | 6 +- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 7 +- llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll | 67 ++ .../AMDGPU/ptradd-sdag-optimizations.ll | 196 ++ 6 files changed, 105 insertions(+), 194 deletions(-) diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp index bcf25958d0982..4ce58c0027aa6 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp @@ -8554,7 +8554,7 @@ static bool isMemSrcFromConstant(SDValue Src, ConstantDataArraySlice &Slice) { GlobalAddressSDNode *G = nullptr; if (Src.getOpcode() == ISD::GlobalAddress) G = cast(Src); - else if (Src.getOpcode() == ISD::ADD && + else if (Src->isAnyAdd() && Src.getOperand(0).getOpcode() == ISD::GlobalAddress && Src.getOperand(1).getOpcode() == ISD::Constant) { G = cast(Src.getOperand(0)); diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp index fd6d20e146bb2..e4d45f14a0c44 100644 --- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp @@ -638,8 +638,14 @@ bool TargetLowering::ShrinkDemandedOp(SDValue Op, unsigned BitWidth, // operands on the new node are also disjoint. SDNodeFlags Flags(Op->getFlags().hasDisjoint() ? SDNodeFlags::Disjoint : SDNodeFlags::None); + unsigned Opcode = Op.getOpcode(); + if (Opcode == ISD::PTRADD) { +// It isn't a ptradd anymore if it doesn't operate on the entire +// pointer. +Opcode = ISD::ADD; + } SDValue X = DAG.getNode( - Op.getOpcode(), dl, SmallVT, + Opcode, dl, SmallVT, DAG.getNode(ISD::TRUNCATE, dl, SmallVT, Op.getOperand(0)), DAG.getNode(ISD::TRUNCATE, dl, SmallVT, Op.getOperand(1)), Flags); assert(DemandedSize <= SmallVTBits && "Narrowed below demanded bits?"); @@ -2860,6 +2866,11 @@ bool TargetLowering::SimplifyDemandedBits( return TLO.CombineTo(Op, And1); } [[fallthrough]]; + case ISD::PTRADD: +if (Op.getOperand(0).getValueType() != Op.getOperand(1).getValueType()) + break; +// PTRADD behaves like ADD if pointers are represented as integers. +[[fallthrough]]; case ISD::ADD: case ISD::SUB: { // Add, Sub, and Mul don't demand any bits in positions beyond that @@ -2969,10 +2980,10 @@ bool TargetLowering::SimplifyDemandedBits( if (Op.getOpcode() == ISD::MUL) { Known = KnownBits::mul(KnownOp0, KnownOp1); -} else { // Op.getOpcode() is either ISD::ADD or ISD::SUB. +} else { // Op.getOpcode() is either ISD::ADD, ISD::PTRADD, or ISD::SUB. Known = KnownBits::computeForAddSub( - Op.getOpcode() == ISD::ADD, Flags.hasNoSignedWrap(), - Flags.hasNoUnsignedWrap(), KnownOp0, KnownOp1); + Op->isAnyAdd(), Flags.hasNoSignedWrap(), Flags.hasNoUnsignedWrap(), + KnownOp0, KnownOp1); } break; } @@ -5675,7 +5686,7 @@ bool TargetLowering::isGAPlusOffset(SDNode *WN, const GlobalValue *&GA, return true; } - if (N->getOpcode() == ISD::ADD) { + if (N->isAnyAdd()) { SDValue N1 = N->getOperand(0); SDValue N2 = N->getOperand(1); if (isGAPlusOffset(N1.getNode(), GA, Offset)) { diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp index 3785d0f7f2688..a0c2e60efcd9a 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp @@ -1531,7 +1531,7 @@ bool AMDGPUDAGToDAGISel::SelectMUBUF(SDValue Addr, SDValue &Ptr, SDValue &VAddr, C1 = nullptr; } - if (N0.getOpcode() == ISD::ADD) { + if (N0->isAnyAdd()) { // (add N2, N3) -> addr64, or // (add (add N2, N3), C1) -> addr64 SDValue N2 = N0.getOperand(0); @@ -1993,7 +1993,7 @@ bool AMDGPUDAGToDAGISel::SelectGlobalSAddr(SDNode *N, SDValue Addr, } // Match the variable offset. - if (Addr.getOpcode() == ISD::ADD) { + if (Addr->isAnyAdd()) { LHS = Addr.getOperand(0); if (!LHS
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in various special cases (PR #145330)
https://github.com/ritter-x2a updated https://github.com/llvm/llvm-project/pull/145330 >From 41b0c715809685ab360559cf47af2fa3ddb8f036 Mon Sep 17 00:00:00 2001 From: Fabian Ritter Date: Tue, 17 Jun 2025 04:03:53 -0400 Subject: [PATCH 1/2] [AMDGPU][SDAG] Handle ISD::PTRADD in various special cases There are more places in SIISelLowering.cpp and AMDGPUISelDAGToDAG.cpp that check for ISD::ADD in a pointer context, but as far as I can tell those are only relevant for 32-bit pointer arithmetic (like frame indices/scratch addresses and LDS), for which we don't enable PTRADD generation yet. For SWDEV-516125. --- .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 2 +- .../CodeGen/SelectionDAG/TargetLowering.cpp | 21 +- llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp | 6 +- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 7 +- llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll | 67 ++ .../AMDGPU/ptradd-sdag-optimizations.ll | 196 ++ 6 files changed, 105 insertions(+), 194 deletions(-) diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp index bcf25958d0982..4ce58c0027aa6 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp @@ -8554,7 +8554,7 @@ static bool isMemSrcFromConstant(SDValue Src, ConstantDataArraySlice &Slice) { GlobalAddressSDNode *G = nullptr; if (Src.getOpcode() == ISD::GlobalAddress) G = cast(Src); - else if (Src.getOpcode() == ISD::ADD && + else if (Src->isAnyAdd() && Src.getOperand(0).getOpcode() == ISD::GlobalAddress && Src.getOperand(1).getOpcode() == ISD::Constant) { G = cast(Src.getOperand(0)); diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp index fd6d20e146bb2..e4d45f14a0c44 100644 --- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp @@ -638,8 +638,14 @@ bool TargetLowering::ShrinkDemandedOp(SDValue Op, unsigned BitWidth, // operands on the new node are also disjoint. SDNodeFlags Flags(Op->getFlags().hasDisjoint() ? SDNodeFlags::Disjoint : SDNodeFlags::None); + unsigned Opcode = Op.getOpcode(); + if (Opcode == ISD::PTRADD) { +// It isn't a ptradd anymore if it doesn't operate on the entire +// pointer. +Opcode = ISD::ADD; + } SDValue X = DAG.getNode( - Op.getOpcode(), dl, SmallVT, + Opcode, dl, SmallVT, DAG.getNode(ISD::TRUNCATE, dl, SmallVT, Op.getOperand(0)), DAG.getNode(ISD::TRUNCATE, dl, SmallVT, Op.getOperand(1)), Flags); assert(DemandedSize <= SmallVTBits && "Narrowed below demanded bits?"); @@ -2860,6 +2866,11 @@ bool TargetLowering::SimplifyDemandedBits( return TLO.CombineTo(Op, And1); } [[fallthrough]]; + case ISD::PTRADD: +if (Op.getOperand(0).getValueType() != Op.getOperand(1).getValueType()) + break; +// PTRADD behaves like ADD if pointers are represented as integers. +[[fallthrough]]; case ISD::ADD: case ISD::SUB: { // Add, Sub, and Mul don't demand any bits in positions beyond that @@ -2969,10 +2980,10 @@ bool TargetLowering::SimplifyDemandedBits( if (Op.getOpcode() == ISD::MUL) { Known = KnownBits::mul(KnownOp0, KnownOp1); -} else { // Op.getOpcode() is either ISD::ADD or ISD::SUB. +} else { // Op.getOpcode() is either ISD::ADD, ISD::PTRADD, or ISD::SUB. Known = KnownBits::computeForAddSub( - Op.getOpcode() == ISD::ADD, Flags.hasNoSignedWrap(), - Flags.hasNoUnsignedWrap(), KnownOp0, KnownOp1); + Op->isAnyAdd(), Flags.hasNoSignedWrap(), Flags.hasNoUnsignedWrap(), + KnownOp0, KnownOp1); } break; } @@ -5675,7 +5686,7 @@ bool TargetLowering::isGAPlusOffset(SDNode *WN, const GlobalValue *&GA, return true; } - if (N->getOpcode() == ISD::ADD) { + if (N->isAnyAdd()) { SDValue N1 = N->getOperand(0); SDValue N2 = N->getOperand(1); if (isGAPlusOffset(N1.getNode(), GA, Offset)) { diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp index 3785d0f7f2688..a0c2e60efcd9a 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp @@ -1531,7 +1531,7 @@ bool AMDGPUDAGToDAGISel::SelectMUBUF(SDValue Addr, SDValue &Ptr, SDValue &VAddr, C1 = nullptr; } - if (N0.getOpcode() == ISD::ADD) { + if (N0->isAnyAdd()) { // (add N2, N3) -> addr64, or // (add (add N2, N3), C1) -> addr64 SDValue N2 = N0.getOperand(0); @@ -1993,7 +1993,7 @@ bool AMDGPUDAGToDAGISel::SelectGlobalSAddr(SDNode *N, SDValue Addr, } // Match the variable offset. - if (Addr.getOpcode() == ISD::ADD) { + if (Addr->isAnyAdd()) { LHS = Addr.getOperand(0); if (!LHS
[llvm-branch-commits] [llvm] [SDAG][AMDGPU] Allow opting in to OOB-generating PTRADD transforms (PR #146074)
https://github.com/ritter-x2a updated https://github.com/llvm/llvm-project/pull/146074 >From 62623004e49ca66a426455e4b3ac4028f10f68fd Mon Sep 17 00:00:00 2001 From: Fabian Ritter Date: Thu, 26 Jun 2025 06:10:35 -0400 Subject: [PATCH 1/2] [SDAG][AMDGPU] Allow opting in to OOB-generating PTRADD transforms This PR adds a TargetLowering hook, canTransformPtrArithOutOfBounds, that targets can use to allow transformations to introduce out-of-bounds pointer arithmetic. It also moves two such transformations from the AMDGPU-specific DAG combines to the generic DAGCombiner. This is motivated by target features like AArch64's checked pointer arithmetic, CPA, which does not tolerate the introduction of out-of-bounds pointer arithmetic. --- llvm/include/llvm/CodeGen/TargetLowering.h| 7 + llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | 125 +++--- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 59 ++--- llvm/lib/Target/AMDGPU/SIISelLowering.h | 3 + 4 files changed, 94 insertions(+), 100 deletions(-) diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h index 2ba8b29e775e0..d3aa168aaa861 100644 --- a/llvm/include/llvm/CodeGen/TargetLowering.h +++ b/llvm/include/llvm/CodeGen/TargetLowering.h @@ -3518,6 +3518,13 @@ class LLVM_ABI TargetLoweringBase { return false; } + /// True if the target allows transformations of in-bounds pointer + /// arithmetic that cause out-of-bounds intermediate results. + virtual bool canTransformPtrArithOutOfBounds(const Function &F, + EVT PtrVT) const { +return false; + } + /// Does this target support complex deinterleaving virtual bool isComplexDeinterleavingSupported() const { return false; } diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp index d130efe96b56b..9ee74cf5fbbdd 100644 --- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp @@ -2696,59 +2696,82 @@ SDValue DAGCombiner::visitPTRADD(SDNode *N) { if (PtrVT == IntVT && isNullConstant(N0)) return N1; - if (N0.getOpcode() != ISD::PTRADD || - reassociationCanBreakAddressingModePattern(ISD::PTRADD, DL, N, N0, N1)) -return SDValue(); - - SDValue X = N0.getOperand(0); - SDValue Y = N0.getOperand(1); - SDValue Z = N1; - bool N0OneUse = N0.hasOneUse(); - bool YIsConstant = DAG.isConstantIntBuildVectorOrConstantInt(Y); - bool ZIsConstant = DAG.isConstantIntBuildVectorOrConstantInt(Z); - - // (ptradd (ptradd x, y), z) -> (ptradd x, (add y, z)) if: - // * y is a constant and (ptradd x, y) has one use; or - // * y and z are both constants. - if ((YIsConstant && N0OneUse) || (YIsConstant && ZIsConstant)) { -// If both additions in the original were NUW, the new ones are as well. -SDNodeFlags Flags = -(N->getFlags() & N0->getFlags()) & SDNodeFlags::NoUnsignedWrap; -SDValue Add = DAG.getNode(ISD::ADD, DL, IntVT, {Y, Z}, Flags); -AddToWorklist(Add.getNode()); -return DAG.getMemBasePlusOffset(X, Add, DL, Flags); + if (N0.getOpcode() == ISD::PTRADD && + !reassociationCanBreakAddressingModePattern(ISD::PTRADD, DL, N, N0, N1)) { +SDValue X = N0.getOperand(0); +SDValue Y = N0.getOperand(1); +SDValue Z = N1; +bool N0OneUse = N0.hasOneUse(); +bool YIsConstant = DAG.isConstantIntBuildVectorOrConstantInt(Y); +bool ZIsConstant = DAG.isConstantIntBuildVectorOrConstantInt(Z); + +// (ptradd (ptradd x, y), z) -> (ptradd x, (add y, z)) if: +// * y is a constant and (ptradd x, y) has one use; or +// * y and z are both constants. +if ((YIsConstant && N0OneUse) || (YIsConstant && ZIsConstant)) { + // If both additions in the original were NUW, the new ones are as well. + SDNodeFlags Flags = + (N->getFlags() & N0->getFlags()) & SDNodeFlags::NoUnsignedWrap; + SDValue Add = DAG.getNode(ISD::ADD, DL, IntVT, {Y, Z}, Flags); + AddToWorklist(Add.getNode()); + return DAG.getMemBasePlusOffset(X, Add, DL, Flags); +} + } + + // The following combines can turn in-bounds pointer arithmetic out of bounds. + // That is problematic for settings like AArch64's CPA, which checks that + // intermediate results of pointer arithmetic remain in bounds. The target + // therefore needs to opt-in to enable them. + if (!TLI.canTransformPtrArithOutOfBounds( + DAG.getMachineFunction().getFunction(), PtrVT)) +return SDValue(); + + if (N0.getOpcode() == ISD::PTRADD && N1.getOpcode() == ISD::Constant) { +// Fold (ptradd (ptradd GA, v), c) -> (ptradd (ptradd GA, c) v) with +// global address GA and constant c, such that c can be folded into GA. +SDValue GAValue = N0.getOperand(0); +if (const GlobalAddressSDNode *GA = +dyn_cast(GAValue)) { + const TargetLowering &TLI = DAG.getTargetLoweringInfo(); + if (!LegalOperations && TLI.
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Test ISD::PTRADD handling in various special cases (PR #145329)
https://github.com/ritter-x2a updated https://github.com/llvm/llvm-project/pull/145329 >From 345456442d0d9e5a8babd9b72b8343d6608399d5 Mon Sep 17 00:00:00 2001 From: Fabian Ritter Date: Tue, 17 Jun 2025 03:51:19 -0400 Subject: [PATCH] [AMDGPU][SDAG] Test ISD::PTRADD handling in various special cases Pre-committing tests to show improvements in a follow-up PR. --- llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll | 63 ++ .../AMDGPU/ptradd-sdag-optimizations.ll | 206 ++ 2 files changed, 269 insertions(+) create mode 100644 llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll new file mode 100644 index 0..fab56383ffa8a --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll @@ -0,0 +1,63 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=tahiti -amdgpu-use-sdag-ptradd=1 < %s | FileCheck --check-prefixes=GFX6,GFX6_PTRADD %s +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=tahiti -amdgpu-use-sdag-ptradd=0 < %s | FileCheck --check-prefixes=GFX6,GFX6_LEGACY %s + +; Test PTRADD handling in AMDGPUDAGToDAGISel::SelectMUBUF. + +define amdgpu_kernel void @v_add_i32(ptr addrspace(1) %out, ptr addrspace(1) %in) { +; GFX6_PTRADD-LABEL: v_add_i32: +; GFX6_PTRADD: ; %bb.0: +; GFX6_PTRADD-NEXT:s_load_dwordx4 s[0:3], s[8:9], 0x0 +; GFX6_PTRADD-NEXT:v_lshlrev_b32_e32 v0, 2, v0 +; GFX6_PTRADD-NEXT:s_mov_b32 s7, 0x100f000 +; GFX6_PTRADD-NEXT:s_mov_b32 s10, 0 +; GFX6_PTRADD-NEXT:s_mov_b32 s11, s7 +; GFX6_PTRADD-NEXT:s_waitcnt lgkmcnt(0) +; GFX6_PTRADD-NEXT:v_mov_b32_e32 v1, s3 +; GFX6_PTRADD-NEXT:v_add_i32_e32 v0, vcc, s2, v0 +; GFX6_PTRADD-NEXT:v_addc_u32_e32 v1, vcc, 0, v1, vcc +; GFX6_PTRADD-NEXT:s_mov_b32 s8, s10 +; GFX6_PTRADD-NEXT:s_mov_b32 s9, s10 +; GFX6_PTRADD-NEXT:buffer_load_dword v2, v[0:1], s[8:11], 0 addr64 glc +; GFX6_PTRADD-NEXT:s_waitcnt vmcnt(0) +; GFX6_PTRADD-NEXT:buffer_load_dword v0, v[0:1], s[8:11], 0 addr64 offset:4 glc +; GFX6_PTRADD-NEXT:s_waitcnt vmcnt(0) +; GFX6_PTRADD-NEXT:s_mov_b32 s6, -1 +; GFX6_PTRADD-NEXT:s_mov_b32 s4, s0 +; GFX6_PTRADD-NEXT:s_mov_b32 s5, s1 +; GFX6_PTRADD-NEXT:v_add_i32_e32 v0, vcc, v2, v0 +; GFX6_PTRADD-NEXT:buffer_store_dword v0, off, s[4:7], 0 +; GFX6_PTRADD-NEXT:s_endpgm +; +; GFX6_LEGACY-LABEL: v_add_i32: +; GFX6_LEGACY: ; %bb.0: +; GFX6_LEGACY-NEXT:s_load_dwordx4 s[0:3], s[8:9], 0x0 +; GFX6_LEGACY-NEXT:s_mov_b32 s7, 0x100f000 +; GFX6_LEGACY-NEXT:s_mov_b32 s10, 0 +; GFX6_LEGACY-NEXT:s_mov_b32 s11, s7 +; GFX6_LEGACY-NEXT:v_lshlrev_b32_e32 v0, 2, v0 +; GFX6_LEGACY-NEXT:s_waitcnt lgkmcnt(0) +; GFX6_LEGACY-NEXT:s_mov_b64 s[8:9], s[2:3] +; GFX6_LEGACY-NEXT:v_mov_b32_e32 v1, 0 +; GFX6_LEGACY-NEXT:buffer_load_dword v2, v[0:1], s[8:11], 0 addr64 glc +; GFX6_LEGACY-NEXT:s_waitcnt vmcnt(0) +; GFX6_LEGACY-NEXT:buffer_load_dword v0, v[0:1], s[8:11], 0 addr64 offset:4 glc +; GFX6_LEGACY-NEXT:s_waitcnt vmcnt(0) +; GFX6_LEGACY-NEXT:s_mov_b32 s6, -1 +; GFX6_LEGACY-NEXT:s_mov_b32 s4, s0 +; GFX6_LEGACY-NEXT:s_mov_b32 s5, s1 +; GFX6_LEGACY-NEXT:v_add_i32_e32 v0, vcc, v2, v0 +; GFX6_LEGACY-NEXT:buffer_store_dword v0, off, s[4:7], 0 +; GFX6_LEGACY-NEXT:s_endpgm + %tid = call i32 @llvm.amdgcn.workitem.id.x() + %gep = getelementptr inbounds i32, ptr addrspace(1) %in, i32 %tid + %b_ptr = getelementptr i32, ptr addrspace(1) %gep, i32 1 + %a = load volatile i32, ptr addrspace(1) %gep + %b = load volatile i32, ptr addrspace(1) %b_ptr + %result = add i32 %a, %b + store i32 %result, ptr addrspace(1) %out + ret void +} + +;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line: +; GFX6: {{.*}} diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll index 0fe4d337a5bd7..41e47e834b723 100644 --- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll +++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll @@ -290,3 +290,209 @@ define ptr @fold_mul24_into_mad(ptr %base, i64 %a, i64 %b) { %gep = getelementptr inbounds i8, ptr %base, i64 %mul ret ptr %gep } + +; Test PTRADD handling in AMDGPUDAGToDAGISel::SelectGlobalSAddr. +define amdgpu_kernel void @uniform_base_varying_offset_imm(ptr addrspace(1) %p) { +; GFX942_PTRADD-LABEL: uniform_base_varying_offset_imm: +; GFX942_PTRADD: ; %bb.0: ; %entry +; GFX942_PTRADD-NEXT:s_load_dwordx2 s[0:1], s[4:5], 0x0 +; GFX942_PTRADD-NEXT:v_and_b32_e32 v0, 0x3ff, v0 +; GFX942_PTRADD-NEXT:v_mov_b32_e32 v1, 0 +; GFX942_PTRADD-NEXT:v_lshlrev_b32_e32 v0, 2, v0 +; GFX942_PTRADD-NEXT:v_mov_b32_e32 v2, 1 +; GFX942_PTRADD-NEXT:s_waitcnt lgkmcnt(0) +; GFX942_PTRADD-NEXT:v_lshl_add_u64 v[0:1], s[0:1], 0, v[0:1] +; GFX942_PTRAD
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR (PR #146075)
https://github.com/ritter-x2a updated https://github.com/llvm/llvm-project/pull/146075 >From 18dcde6a8c7bddfbd56077dc81b0b80535cc49a1 Mon Sep 17 00:00:00 2001 From: Fabian Ritter Date: Fri, 27 Jun 2025 04:23:50 -0400 Subject: [PATCH 1/5] [AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR If we can't fold a PTRADD's offset into its users, lowering them to disjoint ORs is preferable: Often, a 32-bit OR instruction suffices where we'd otherwise use a pair of 32-bit additions with carry. This needs to be a DAGCombine (and not a selection rule) because its main purpose is to enable subsequent DAGCombines for bitwise operations. We don't want to just turn PTRADDs into disjoint ORs whenever that's sound because this transform loses the information that the operation implements pointer arithmetic, which we will soon need to fold offsets into FLAT instructions. Currently, disjoint ORs can still be used for offset folding, so that part of the logic can't be tested. The PR contains a hacky workaround for a situation where an AssertAlign operand of a PTRADD is not DAGCombined before the PTRADD, causing the PTRADD to be turned into a disjoint OR although reassociating it with the operand of the AssertAlign would be better. This wouldn't be a problem if the DAGCombiner ensured that a node is only processed after all its operands have been processed. For SWDEV-516125. --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 35 .../AMDGPU/ptradd-sdag-optimizations.ll | 56 ++- 2 files changed, 90 insertions(+), 1 deletion(-) diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index a1af50dac7e54..ec7002bdd9f43 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -15822,6 +15822,41 @@ SDValue SITargetLowering::performPtrAddCombine(SDNode *N, return Folded; } + // Transform (ptradd a, b) -> (or disjoint a, b) if it is equivalent and if + // that transformation can't block an offset folding at any use of the ptradd. + // This should be done late, after legalization, so that it doesn't block + // other ptradd combines that could enable more offset folding. + bool HasIntermediateAssertAlign = + N0->getOpcode() == ISD::AssertAlign && N0->getOperand(0)->isAnyAdd(); + // This is a hack to work around an ordering problem for DAGs like this: + // (ptradd (AssertAlign (ptradd p, c1), k), c2) + // If the outer ptradd is handled first by the DAGCombiner, it can be + // transformed into a disjoint or. Then, when the generic AssertAlign combine + // pushes the AssertAlign through the inner ptradd, it's too late for the + // ptradd reassociation to trigger. + if (!DCI.isBeforeLegalizeOps() && !HasIntermediateAssertAlign && + DAG.haveNoCommonBitsSet(N0, N1)) { +bool TransformCanBreakAddrMode = any_of(N->users(), [&](SDNode *User) { + if (auto *LoadStore = dyn_cast(User); + LoadStore && LoadStore->getBasePtr().getNode() == N) { +unsigned AS = LoadStore->getAddressSpace(); +// Currently, we only really need ptradds to fold offsets into flat +// memory instructions. +if (AS != AMDGPUAS::FLAT_ADDRESS) + return false; +TargetLoweringBase::AddrMode AM; +AM.HasBaseReg = true; +EVT VT = LoadStore->getMemoryVT(); +Type *AccessTy = VT.getTypeForEVT(*DAG.getContext()); +return isLegalAddressingMode(DAG.getDataLayout(), AM, AccessTy, AS); + } + return false; +}); + +if (!TransformCanBreakAddrMode) + return DAG.getNode(ISD::OR, DL, VT, N0, N1, SDNodeFlags::Disjoint); + } + if (N1.getOpcode() != ISD::ADD || !N1.hasOneUse()) return SDValue(); diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll index 199c1f61d2522..7d7fe141e5440 100644 --- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll +++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll @@ -100,7 +100,7 @@ define void @baseptr_null(i64 %offset, i8 %v) { ; Taken from implicit-kernarg-backend-usage.ll, tests the PTRADD handling in the ; assertalign DAG combine. -define amdgpu_kernel void @llvm_amdgcn_queue_ptr(ptr addrspace(1) %ptr) #0 { +define amdgpu_kernel void @llvm_amdgcn_queue_ptr(ptr addrspace(1) %ptr) { ; GFX942-LABEL: llvm_amdgcn_queue_ptr: ; GFX942: ; %bb.0: ; GFX942-NEXT:v_mov_b32_e32 v0, 0 @@ -415,6 +415,60 @@ entry: ret void } +; Check that ptradds can be lowered to disjoint ORs. +define ptr @gep_disjoint_or(ptr %base) { +; GFX942-LABEL: gep_disjoint_or: +; GFX942: ; %bb.0: +; GFX942-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX942-NEXT:v_and_or_b32 v0, v0, -16, 4 +; GFX942-NEXT:s_setpc_b64 s[30:31] + %p = call ptr @llvm.ptrmask(ptr %base, i64 s0xf0) + %gep = getelementptr nuw inbounds i8, ptr %p, i64 4 + ret ptr %gep +} + +; Check that AssertAlign no
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Enable ISD::PTRADD for 64-bit AS by default (PR #146076)
https://github.com/ritter-x2a updated https://github.com/llvm/llvm-project/pull/146076 >From 8710de705f09d90f166f82c1733620b2c8581306 Mon Sep 17 00:00:00 2001 From: Fabian Ritter Date: Fri, 27 Jun 2025 05:38:52 -0400 Subject: [PATCH 1/3] [AMDGPU][SDAG] Enable ISD::PTRADD for 64-bit AS by default Also removes the command line option to control this feature. There seem to be mainly two kinds of test changes: - Some operands of addition instructions are swapped; that is to be expected since PTRADD is not commutative. - Improvements in code generation, probably because the legacy lowering enabled some transformations that were sometimes harmful. For SWDEV-516125. --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 10 +- .../identical-subrange-spill-infloop.ll | 352 +++--- .../AMDGPU/infer-addrspace-flat-atomic.ll | 14 +- llvm/test/CodeGen/AMDGPU/lds-frame-extern.ll | 8 +- .../AMDGPU/lower-module-lds-via-hybrid.ll | 4 +- .../AMDGPU/lower-module-lds-via-table.ll | 16 +- .../match-perm-extract-vector-elt-bug.ll | 22 +- llvm/test/CodeGen/AMDGPU/memmove-var-size.ll | 16 +- .../AMDGPU/preload-implicit-kernargs.ll | 6 +- .../AMDGPU/promote-constOffset-to-imm.ll | 8 +- llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll | 7 +- .../AMDGPU/ptradd-sdag-optimizations.ll | 94 ++--- .../AMDGPU/ptradd-sdag-undef-poison.ll| 6 +- llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll | 27 +- llvm/test/CodeGen/AMDGPU/store-weird-sizes.ll | 29 +- 15 files changed, 310 insertions(+), 309 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index a1af50dac7e54..05ab745171f6d 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -63,14 +63,6 @@ static cl::opt UseDivergentRegisterIndexing( cl::desc("Use indirect register addressing for divergent indexes"), cl::init(false)); -// TODO: This option should be removed once we switch to always using PTRADD in -// the SelectionDAG. -static cl::opt UseSelectionDAGPTRADD( -"amdgpu-use-sdag-ptradd", cl::Hidden, -cl::desc("Generate ISD::PTRADD nodes for 64-bit pointer arithmetic in the " - "SelectionDAG ISel"), -cl::init(false)); - static bool denormalModeIsFlushAllF32(const MachineFunction &MF) { const SIMachineFunctionInfo *Info = MF.getInfo(); return Info->getMode().FP32Denormals == DenormalMode::getPreserveSign(); @@ -11252,7 +11244,7 @@ static bool isNoUnsignedWrap(SDValue Addr) { bool SITargetLowering::shouldPreservePtrArith(const Function &F, EVT PtrVT) const { - return UseSelectionDAGPTRADD && PtrVT == MVT::i64; + return PtrVT == MVT::i64; } bool SITargetLowering::canTransformPtrArithOutOfBounds(const Function &F, diff --git a/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll b/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll index 2c03113e8af47..805cdd37d6e70 100644 --- a/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll +++ b/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll @@ -6,96 +6,150 @@ define void @main(i1 %arg) #0 { ; CHECK: ; %bb.0: ; %bb ; CHECK-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; CHECK-NEXT:s_xor_saveexec_b64 s[4:5], -1 -; CHECK-NEXT:buffer_store_dword v5, off, s[0:3], s32 ; 4-byte Folded Spill -; CHECK-NEXT:buffer_store_dword v6, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill +; CHECK-NEXT:buffer_store_dword v6, off, s[0:3], s32 ; 4-byte Folded Spill +; CHECK-NEXT:buffer_store_dword v7, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill ; CHECK-NEXT:s_mov_b64 exec, s[4:5] -; CHECK-NEXT:v_writelane_b32 v5, s30, 0 -; CHECK-NEXT:v_writelane_b32 v5, s31, 1 -; CHECK-NEXT:v_writelane_b32 v5, s36, 2 -; CHECK-NEXT:v_writelane_b32 v5, s37, 3 -; CHECK-NEXT:v_writelane_b32 v5, s38, 4 -; CHECK-NEXT:v_writelane_b32 v5, s39, 5 -; CHECK-NEXT:v_writelane_b32 v5, s48, 6 -; CHECK-NEXT:v_writelane_b32 v5, s49, 7 -; CHECK-NEXT:v_writelane_b32 v5, s50, 8 -; CHECK-NEXT:v_writelane_b32 v5, s51, 9 -; CHECK-NEXT:v_writelane_b32 v5, s52, 10 -; CHECK-NEXT:v_writelane_b32 v5, s53, 11 -; CHECK-NEXT:v_writelane_b32 v5, s54, 12 -; CHECK-NEXT:v_writelane_b32 v5, s55, 13 -; CHECK-NEXT:s_getpc_b64 s[24:25] -; CHECK-NEXT:v_writelane_b32 v5, s64, 14 -; CHECK-NEXT:s_movk_i32 s4, 0xf0 -; CHECK-NEXT:s_mov_b32 s5, s24 -; CHECK-NEXT:v_writelane_b32 v5, s65, 15 -; CHECK-NEXT:s_load_dwordx16 s[8:23], s[4:5], 0x0 -; CHECK-NEXT:s_mov_b64 s[4:5], 0 -; CHECK-NEXT:v_writelane_b32 v5, s66, 16 -; CHECK-NEXT:s_load_dwordx4 s[4:7], s[4:5], 0x0 -; CHECK-NEXT:v_writelane_b32 v5, s67, 17 -; CHECK-NEXT:s_waitcnt lgkmcnt(0) -; CHECK-NEXT:s_movk_i32 s6, 0x130 -; CHECK-NEXT:s_mov_b32 s7, s24 -; CHECK-NEXT:v_writelane_b32 v5
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR (PR #146075)
https://github.com/ritter-x2a updated https://github.com/llvm/llvm-project/pull/146075 >From 18dcde6a8c7bddfbd56077dc81b0b80535cc49a1 Mon Sep 17 00:00:00 2001 From: Fabian Ritter Date: Fri, 27 Jun 2025 04:23:50 -0400 Subject: [PATCH 1/5] [AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR If we can't fold a PTRADD's offset into its users, lowering them to disjoint ORs is preferable: Often, a 32-bit OR instruction suffices where we'd otherwise use a pair of 32-bit additions with carry. This needs to be a DAGCombine (and not a selection rule) because its main purpose is to enable subsequent DAGCombines for bitwise operations. We don't want to just turn PTRADDs into disjoint ORs whenever that's sound because this transform loses the information that the operation implements pointer arithmetic, which we will soon need to fold offsets into FLAT instructions. Currently, disjoint ORs can still be used for offset folding, so that part of the logic can't be tested. The PR contains a hacky workaround for a situation where an AssertAlign operand of a PTRADD is not DAGCombined before the PTRADD, causing the PTRADD to be turned into a disjoint OR although reassociating it with the operand of the AssertAlign would be better. This wouldn't be a problem if the DAGCombiner ensured that a node is only processed after all its operands have been processed. For SWDEV-516125. --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 35 .../AMDGPU/ptradd-sdag-optimizations.ll | 56 ++- 2 files changed, 90 insertions(+), 1 deletion(-) diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index a1af50dac7e54..ec7002bdd9f43 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -15822,6 +15822,41 @@ SDValue SITargetLowering::performPtrAddCombine(SDNode *N, return Folded; } + // Transform (ptradd a, b) -> (or disjoint a, b) if it is equivalent and if + // that transformation can't block an offset folding at any use of the ptradd. + // This should be done late, after legalization, so that it doesn't block + // other ptradd combines that could enable more offset folding. + bool HasIntermediateAssertAlign = + N0->getOpcode() == ISD::AssertAlign && N0->getOperand(0)->isAnyAdd(); + // This is a hack to work around an ordering problem for DAGs like this: + // (ptradd (AssertAlign (ptradd p, c1), k), c2) + // If the outer ptradd is handled first by the DAGCombiner, it can be + // transformed into a disjoint or. Then, when the generic AssertAlign combine + // pushes the AssertAlign through the inner ptradd, it's too late for the + // ptradd reassociation to trigger. + if (!DCI.isBeforeLegalizeOps() && !HasIntermediateAssertAlign && + DAG.haveNoCommonBitsSet(N0, N1)) { +bool TransformCanBreakAddrMode = any_of(N->users(), [&](SDNode *User) { + if (auto *LoadStore = dyn_cast(User); + LoadStore && LoadStore->getBasePtr().getNode() == N) { +unsigned AS = LoadStore->getAddressSpace(); +// Currently, we only really need ptradds to fold offsets into flat +// memory instructions. +if (AS != AMDGPUAS::FLAT_ADDRESS) + return false; +TargetLoweringBase::AddrMode AM; +AM.HasBaseReg = true; +EVT VT = LoadStore->getMemoryVT(); +Type *AccessTy = VT.getTypeForEVT(*DAG.getContext()); +return isLegalAddressingMode(DAG.getDataLayout(), AM, AccessTy, AS); + } + return false; +}); + +if (!TransformCanBreakAddrMode) + return DAG.getNode(ISD::OR, DL, VT, N0, N1, SDNodeFlags::Disjoint); + } + if (N1.getOpcode() != ISD::ADD || !N1.hasOneUse()) return SDValue(); diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll index 199c1f61d2522..7d7fe141e5440 100644 --- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll +++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll @@ -100,7 +100,7 @@ define void @baseptr_null(i64 %offset, i8 %v) { ; Taken from implicit-kernarg-backend-usage.ll, tests the PTRADD handling in the ; assertalign DAG combine. -define amdgpu_kernel void @llvm_amdgcn_queue_ptr(ptr addrspace(1) %ptr) #0 { +define amdgpu_kernel void @llvm_amdgcn_queue_ptr(ptr addrspace(1) %ptr) { ; GFX942-LABEL: llvm_amdgcn_queue_ptr: ; GFX942: ; %bb.0: ; GFX942-NEXT:v_mov_b32_e32 v0, 0 @@ -415,6 +415,60 @@ entry: ret void } +; Check that ptradds can be lowered to disjoint ORs. +define ptr @gep_disjoint_or(ptr %base) { +; GFX942-LABEL: gep_disjoint_or: +; GFX942: ; %bb.0: +; GFX942-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX942-NEXT:v_and_or_b32 v0, v0, -16, 4 +; GFX942-NEXT:s_setpc_b64 s[30:31] + %p = call ptr @llvm.ptrmask(ptr %base, i64 s0xf0) + %gep = getelementptr nuw inbounds i8, ptr %p, i64 4 + ret ptr %gep +} + +; Check that AssertAlign no
[llvm-branch-commits] [llvm] release/21.x: [VPlan] Don't narrow op multiple times in narrowInterleaveGroups. (PR #158013)
https://github.com/nikic edited https://github.com/llvm/llvm-project/pull/158013 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [NFC][flang][do concurent] Add saxpy offload tests for OpenMP mapping (PR #155993)
https://github.com/ergawy edited https://github.com/llvm/llvm-project/pull/155993 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] 4a4cc8c - Revert "Introduce LDBG_OS() macro as a variant of LDBG() (#157194)"
Author: Mehdi Amini Date: 2025-09-11T13:33:57+01:00 New Revision: 4a4cc8c0fcec63de73d6b14d258204593c181b79 URL: https://github.com/llvm/llvm-project/commit/4a4cc8c0fcec63de73d6b14d258204593c181b79 DIFF: https://github.com/llvm/llvm-project/commit/4a4cc8c0fcec63de73d6b14d258204593c181b79.diff LOG: Revert "Introduce LDBG_OS() macro as a variant of LDBG() (#157194)" This reverts commit c84f34bcd8c7fb6d5038b3f52da8c7be64ad5189. Added: Modified: llvm/include/llvm/Support/Debug.h llvm/include/llvm/Support/DebugLog.h llvm/unittests/Support/DebugLogTest.cpp mlir/lib/Dialect/Transform/IR/TransformOps.cpp Removed: diff --git a/llvm/include/llvm/Support/Debug.h b/llvm/include/llvm/Support/Debug.h index b73f2d7c8b852..a7795d403721c 100644 --- a/llvm/include/llvm/Support/Debug.h +++ b/llvm/include/llvm/Support/Debug.h @@ -44,6 +44,11 @@ class raw_ostream; /// level, return false. LLVM_ABI bool isCurrentDebugType(const char *Type, int Level = 0); +/// Overload allowing to swap the order of the Type and Level arguments. +LLVM_ABI inline bool isCurrentDebugType(int Level, const char *Type) { + return isCurrentDebugType(Type, Level); +} + /// setCurrentDebugType - Set the current debug type, as if the -debug-only=X /// option were specified. Note that DebugFlag also needs to be set to true for /// debug output to be produced. diff --git a/llvm/include/llvm/Support/DebugLog.h b/llvm/include/llvm/Support/DebugLog.h index 33586dd275573..dce706e196bde 100644 --- a/llvm/include/llvm/Support/DebugLog.h +++ b/llvm/include/llvm/Support/DebugLog.h @@ -19,55 +19,52 @@ namespace llvm { #ifndef NDEBUG -/// LDBG() is a macro that can be used as a raw_ostream for debugging. -/// It will stream the output to the dbgs() stream, with a prefix of the -/// debug type and the file and line number. A trailing newline is added to the -/// output automatically. If the streamed content contains a newline, the prefix -/// is added to each beginning of a new line. Nothing is printed if the debug -/// output is not enabled or the debug type does not match. -/// -/// E.g., -/// LDBG() << "Bitset contains: " << Bitset; -/// is equivalent to -/// LLVM_DEBUG(dbgs() << "[" << DEBUG_TYPE << "] " << __FILE__ << ":" << -/// __LINE__ << " " -/// << "Bitset contains: " << Bitset << "\n"); -/// +// LDBG() is a macro that can be used as a raw_ostream for debugging. +// It will stream the output to the dbgs() stream, with a prefix of the +// debug type and the file and line number. A trailing newline is added to the +// output automatically. If the streamed content contains a newline, the prefix +// is added to each beginning of a new line. Nothing is printed if the debug +// output is not enabled or the debug type does not match. +// +// E.g., +// LDBG() << "Bitset contains: " << Bitset; +// is somehow equivalent to +// LLVM_DEBUG(dbgs() << "[" << DEBUG_TYPE << "] " << __FILE__ << ":" << +// __LINE__ << " " +// << "Bitset contains: " << Bitset << "\n"); +// // An optional `level` argument can be provided to control the verbosity of the -/// output. The default level is 1, and is in increasing level of verbosity. -/// -/// The `level` argument can be a literal integer, or a macro that evaluates to -/// an integer. -/// -/// An optional `type` argument can be provided to control the debug type. The -/// default type is DEBUG_TYPE. The `type` argument can be a literal string, or -/// a macro that evaluates to a string. -/// -/// E.g., -/// LDBG(2) << "Bitset contains: " << Bitset; -/// LDBG("debug_type") << "Bitset contains: " << Bitset; -/// LDBG("debug_type", 2) << "Bitset contains: " << Bitset; +// output. The default level is 1, and is in increasing level of verbosity. +// +// The `level` argument can be a literal integer, or a macro that evaluates to +// an integer. +// +// An optional `type` argument can be provided to control the debug type. The +// default type is DEBUG_TYPE. The `type` argument can be a literal string, or a +// macro that evaluates to a string. #define LDBG(...) _GET_LDBG_MACRO(__VA_ARGS__)(__VA_ARGS__) -/// LDBG_OS() is a macro that behaves like LDBG() but instead of directly using -/// it to stream the output, it takes a callback function that will be called -/// with a raw_ostream. -/// This is useful when you need to pass a `raw_ostream` to a helper function to -/// be able to print (when the `<<` operator is not available). -/// -/// E.g., -/// LDBG_OS([&] (raw_ostream &Os) { -/// Os << "Pass Manager contains: "; -/// pm.printAsTextual(Os); -/// }); -/// -/// Just like LDBG(), it optionally accepts a `level` and `type` arguments. -/// E.g., -/// LDBG_OS(2, [&] (raw_ostream &Os) { ... }); -/// LDBG_OS("debug_type", [&] (raw_ostream &Os) { ... }); -/// LDBG_OS("debug_type", 2, [&] (raw_ostream &Os) { ... }); -/// -#define LDBG_OS(
[llvm-branch-commits] [mlir] [mlir][Transforms] Simplify `ConversionPatternRewriter::replaceOp` implementation (PR #158075)
https://github.com/matthias-springer created https://github.com/llvm/llvm-project/pull/158075 Depends on #158067. >From 8113b1d6c7600dec5ccf93d6c3fe356c08dbc067 Mon Sep 17 00:00:00 2001 From: Matthias Springer Date: Wed, 3 Sep 2025 07:35:47 + Subject: [PATCH] proto --- .../Transforms/Utils/DialectConversion.cpp| 52 +++ 1 file changed, 20 insertions(+), 32 deletions(-) diff --git a/mlir/lib/Transforms/Utils/DialectConversion.cpp b/mlir/lib/Transforms/Utils/DialectConversion.cpp index 4b483c32ecef9..52369c18faa61 100644 --- a/mlir/lib/Transforms/Utils/DialectConversion.cpp +++ b/mlir/lib/Transforms/Utils/DialectConversion.cpp @@ -1618,6 +1618,8 @@ Block *ConversionPatternRewriterImpl::applySignatureConversion( if (!inputMap) { // This block argument was dropped and no replacement value was provided. // Materialize a replacement value "out of thin air". + // Note: Materialization must be built here because we cannot find a + // valid insertion point in the new block. (Will point to the old block.) Value mat = buildUnresolvedMaterialization( MaterializationKind::Source, @@ -1709,8 +1711,9 @@ Value ConversionPatternRewriterImpl::findOrBuildReplacementValue( // mapping. This includes cached materializations. We try to reuse those // instead of generating duplicate IR. ValueVector repl = lookupOrNull(value, value.getType()); - if (!repl.empty()) + if (!repl.empty()) { return repl.front(); + } // Check if the value is dead. No replacement value is needed in that case. // This is an approximate check that may have false negatives but does not @@ -1718,22 +1721,14 @@ Value ConversionPatternRewriterImpl::findOrBuildReplacementValue( // building source materializations that are never used and that fold away.) if (llvm::all_of(value.getUsers(), [&](Operation *op) { return replacedOps.contains(op); }) && - !mapping.isMappedTo(value)) + !mapping.isMappedTo(value)) { return Value(); + } // No replacement value was found. Get the latest replacement value // (regardless of the type) and build a source materialization to the // original type. repl = lookupOrNull(value); - if (repl.empty()) { -// No replacement value is registered in the mapping. This means that the -// value is dropped and no longer needed. (If the value were still needed, -// a source materialization producing a replacement value "out of thin air" -// would have already been created during `replaceOp` or -// `applySignatureConversion`.) -return Value(); - } - // Note: `computeInsertPoint` computes the "earliest" insertion point at // which all values in `repl` are defined. It is important to emit the // materialization at that location because the same materialization may be @@ -1741,13 +1736,19 @@ Value ConversionPatternRewriterImpl::findOrBuildReplacementValue( // in the conversion value mapping.) The insertion point of the // materialization must be valid for all future users that may be created // later in the conversion process. - Value castValue = - buildUnresolvedMaterialization(MaterializationKind::Source, - computeInsertPoint(repl), value.getLoc(), - /*valuesToMap=*/repl, /*inputs=*/repl, - /*outputTypes=*/value.getType(), - /*originalType=*/Type(), converter) - .front(); + OpBuilder::InsertPoint ip; + if (repl.empty()) { +ip = computeInsertPoint(value); + } else { +ip = computeInsertPoint(repl); + } + Value castValue = buildUnresolvedMaterialization( +MaterializationKind::Source, ip, value.getLoc(), +/*valuesToMap=*/repl, /*inputs=*/repl, +/*outputTypes=*/value.getType(), +/*originalType=*/Type(), converter, +/*isPureTypeConversion=*/!repl.empty()) +.front(); return castValue; } @@ -1897,21 +1898,8 @@ void ConversionPatternRewriterImpl::replaceOp( } // Create mappings for each of the new result values. - for (auto [repl, result] : llvm::zip_equal(newValues, op->getResults())) { -if (repl.empty()) { - // This result was dropped and no replacement value was provided. - // Materialize a replacement value "out of thin air". - buildUnresolvedMaterialization( - MaterializationKind::Source, computeInsertPoint(result), - result.getLoc(), /*valuesToMap=*/{result}, /*inputs=*/ValueRange(), - /*outputTypes=*/result.getType(), /*originalType=*/Type(), - currentTypeConverter, /*isPureTypeConversion=*/false); - continue; -} - -// Remap result to replacement value. + for (auto [repl, result] : llvm::zip_equal(newValues, op->getResults())) mapp
[llvm-branch-commits] [llvm] [AMDGPU][Attributor] Add `AAAMDGPUClusterDims` (PR #158076)
shiltian wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/158076?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#158076** https://app.graphite.dev/github/pr/llvm/llvm-project/158076?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/158076?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#157978** https://app.graphite.dev/github/pr/llvm/llvm-project/157978?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/158076 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [Remarks] BitstreamRemarkParser: Refactor error handling (PR #156511)
https://github.com/jroelofs approved this pull request. LGTM with some nits. Tests would be good, but I don't think we should block this on improving things there. https://github.com/llvm/llvm-project/pull/156511 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [Remarks] Restructure bitstream remarks to be fully standalone (PR #156715)
@@ -82,20 +82,26 @@ struct LLVMRemarkSetupFormatError LLVMRemarkSetupFormatError>::LLVMRemarkSetupErrorInfo; }; -/// Setup optimization remarks that output to a file. +/// Setup optimization remarks that output to a file. The returned +/// ToolOutputFile must be kept open for writing until +/// \ref finalizeLLVMOptimizationRemarks() is called. LLVM_ABI Expected> setupLLVMOptimizationRemarks( LLVMContext &Context, StringRef RemarksFilename, StringRef RemarksPasses, StringRef RemarksFormat, bool RemarksWithHotness, std::optional RemarksHotnessThreshold = 0); /// Setup optimization remarks that output directly to a raw_ostream. -/// \p OS is managed by the caller and should be open for writing as long as \p -/// Context is streaming remarks to it. +/// \p OS is managed by the caller and must be open for writing until +/// \ref finalizeLLVMOptimizationRemarks() is called. LLVM_ABI Error setupLLVMOptimizationRemarks( LLVMContext &Context, raw_ostream &OS, StringRef RemarksPasses, StringRef RemarksFormat, bool RemarksWithHotness, std::optional RemarksHotnessThreshold = 0); +/// Finalize optimization remarks. This must be called before closing the +/// (file) stream that was used to setup the remarks. +LLVM_ABI void finalizeLLVMOptimizationRemarks(LLVMContext &Context); jroelofs wrote: Does the "resource" that this closes out have the same lifetime as the `ToolOutputFile`? If so, maybe this API could be simplified by moving this finalization into a subclass's dtor then you can't forget it. https://github.com/llvm/llvm-project/pull/156715 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][Attributor] Add `AAAMDGPUClusterDims` (PR #158076)
@@ -1296,6 +1303,157 @@ struct AAAMDGPUNoAGPR const char AAAMDGPUNoAGPR::ID = 0; +/// An abstract attribute to propagate the function attribute +/// "amdgpu-cluster-dims" from kernel entry functions to device functions. +struct AAAMDGPUClusterDims +: public StateWrapper { + using Base = StateWrapper; + AAAMDGPUClusterDims(const IRPosition &IRP, Attributor &A) : Base(IRP) {} + + /// Create an abstract attribute view for the position \p IRP. + static AAAMDGPUClusterDims &createForPosition(const IRPosition &IRP, +Attributor &A); + + /// See AbstractAttribute::getName(). + StringRef getName() const override { return "AAAMDGPUClusterDims"; } + + /// See AbstractAttribute::getIdAddr(). + const char *getIdAddr() const override { return &ID; } + + /// This function should return true if the type of the \p AA is + /// AAAMDGPUClusterDims. + static bool classof(const AbstractAttribute *AA) { +return (AA->getIdAddr() == &ID); + } + + virtual const AMDGPU::ClusterDimsAttr &getClusterDims() const = 0; + + /// Unique ID (due to the unique address) + static const char ID; +}; + +const char AAAMDGPUClusterDims::ID = 0; + +struct AAAMDGPUClusterDimsFunction : public AAAMDGPUClusterDims { + AAAMDGPUClusterDimsFunction(const IRPosition &IRP, Attributor &A) + : AAAMDGPUClusterDims(IRP, A) {} + + void initialize(Attributor &A) override { +Function *F = getAssociatedFunction(); +assert(F && "empty associated function"); + +Attr = AMDGPU::ClusterDimsAttr::get(*F); + +// No matter what a kernel function has, it is final. +if (AMDGPU::isEntryFunctionCC(F->getCallingConv())) { + if (Attr.isUnknown()) +indicatePessimisticFixpoint(); + else +indicateOptimisticFixpoint(); +} + } + + const std::string getAsStr(Attributor *A) const override { +if (!getAssumed() || Attr.isUnknown()) + return "unknown"; +if (Attr.isNoCluster()) + return "no"; +if (Attr.isVariableedDims()) arsenm wrote: Find and replace typo? "isVariableedDims" https://github.com/llvm/llvm-project/pull/158076 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR (PR #146075)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/146075 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [Remarks] Restructure bitstream remarks to be fully standalone (PR #156715)
@@ -232,43 +221,40 @@ void BitstreamRemarkSerializerHelper::setupBlockInfo() { } void BitstreamRemarkSerializerHelper::emitMetaBlock( -uint64_t ContainerVersion, std::optional RemarkVersion, -std::optional StrTab, std::optional Filename) { // Emit the meta block Bitstream.EnterSubblock(META_BLOCK_ID, 3); // The container version and type. R.clear(); R.push_back(RECORD_META_CONTAINER_INFO); - R.push_back(ContainerVersion); + R.push_back(CurrentContainerVersion); R.push_back(static_cast(ContainerType)); Bitstream.EmitRecordWithAbbrev(RecordMetaContainerInfoAbbrevID, R); switch (ContainerType) { - case BitstreamRemarkContainerType::SeparateRemarksMeta: -assert(StrTab != std::nullopt && *StrTab != nullptr); -emitMetaStrTab(**StrTab); + case BitstreamRemarkContainerType::RemarksFileExternal: assert(Filename != std::nullopt); emitMetaExternalFile(*Filename); break; - case BitstreamRemarkContainerType::SeparateRemarksFile: -assert(RemarkVersion != std::nullopt); -emitMetaRemarkVersion(*RemarkVersion); -break; - case BitstreamRemarkContainerType::Standalone: -assert(RemarkVersion != std::nullopt); -emitMetaRemarkVersion(*RemarkVersion); -assert(StrTab != std::nullopt && *StrTab != nullptr); -emitMetaStrTab(**StrTab); + case BitstreamRemarkContainerType::RemarksFile: +emitMetaRemarkVersion(CurrentRemarkVersion); break; } Bitstream.ExitBlock(); jroelofs wrote: likewise https://github.com/llvm/llvm-project/pull/156715 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [lit] Implement ulimit builtin (PR #157958)
https://github.com/cmtice approved this pull request. https://github.com/llvm/llvm-project/pull/157958 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [DA] Fix Strong SIV test for symbolic coefficients and deltas (#149977) (PR #157738)
kasuga-fj wrote: Not yet. Feel free to go ahead if you’d like. https://github.com/llvm/llvm-project/pull/157738 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] `do concurrent`: support `reduce` on device (PR #156610)
https://github.com/mjklemm approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/156610 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][Transforms] Simplify `ConversionPatternRewriter::replaceOp` implementation (PR #158075)
https://github.com/matthias-springer updated https://github.com/llvm/llvm-project/pull/158075 >From d7d40567d7c5aa55210d965f01773fbc535e50ee Mon Sep 17 00:00:00 2001 From: Matthias Springer Date: Wed, 3 Sep 2025 07:35:47 + Subject: [PATCH] proto --- .../Transforms/Utils/DialectConversion.cpp| 46 +++ 1 file changed, 16 insertions(+), 30 deletions(-) diff --git a/mlir/lib/Transforms/Utils/DialectConversion.cpp b/mlir/lib/Transforms/Utils/DialectConversion.cpp index 4b483c32ecef9..65bed7d85ec66 100644 --- a/mlir/lib/Transforms/Utils/DialectConversion.cpp +++ b/mlir/lib/Transforms/Utils/DialectConversion.cpp @@ -1618,6 +1618,8 @@ Block *ConversionPatternRewriterImpl::applySignatureConversion( if (!inputMap) { // This block argument was dropped and no replacement value was provided. // Materialize a replacement value "out of thin air". + // Note: Materialization must be built here because we cannot find a + // valid insertion point in the new block. (Will point to the old block.) Value mat = buildUnresolvedMaterialization( MaterializationKind::Source, @@ -1725,15 +1727,6 @@ Value ConversionPatternRewriterImpl::findOrBuildReplacementValue( // (regardless of the type) and build a source materialization to the // original type. repl = lookupOrNull(value); - if (repl.empty()) { -// No replacement value is registered in the mapping. This means that the -// value is dropped and no longer needed. (If the value were still needed, -// a source materialization producing a replacement value "out of thin air" -// would have already been created during `replaceOp` or -// `applySignatureConversion`.) -return Value(); - } - // Note: `computeInsertPoint` computes the "earliest" insertion point at // which all values in `repl` are defined. It is important to emit the // materialization at that location because the same materialization may be @@ -1741,13 +1734,19 @@ Value ConversionPatternRewriterImpl::findOrBuildReplacementValue( // in the conversion value mapping.) The insertion point of the // materialization must be valid for all future users that may be created // later in the conversion process. - Value castValue = - buildUnresolvedMaterialization(MaterializationKind::Source, - computeInsertPoint(repl), value.getLoc(), - /*valuesToMap=*/repl, /*inputs=*/repl, - /*outputTypes=*/value.getType(), - /*originalType=*/Type(), converter) - .front(); + OpBuilder::InsertPoint ip; + if (repl.empty()) { +ip = computeInsertPoint(value); + } else { +ip = computeInsertPoint(repl); + } + Value castValue = buildUnresolvedMaterialization( +MaterializationKind::Source, ip, value.getLoc(), +/*valuesToMap=*/repl, /*inputs=*/repl, +/*outputTypes=*/value.getType(), +/*originalType=*/Type(), converter, +/*isPureTypeConversion=*/!repl.empty()) +.front(); return castValue; } @@ -1897,21 +1896,8 @@ void ConversionPatternRewriterImpl::replaceOp( } // Create mappings for each of the new result values. - for (auto [repl, result] : llvm::zip_equal(newValues, op->getResults())) { -if (repl.empty()) { - // This result was dropped and no replacement value was provided. - // Materialize a replacement value "out of thin air". - buildUnresolvedMaterialization( - MaterializationKind::Source, computeInsertPoint(result), - result.getLoc(), /*valuesToMap=*/{result}, /*inputs=*/ValueRange(), - /*outputTypes=*/result.getType(), /*originalType=*/Type(), - currentTypeConverter, /*isPureTypeConversion=*/false); - continue; -} - -// Remap result to replacement value. + for (auto [repl, result] : llvm::zip_equal(newValues, op->getResults())) mapping.map(static_cast(result), std::move(repl)); - } appendRewrite(op, currentTypeConverter); // Mark this operation and all nested ops as replaced. ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [DA] Fix Strong SIV test for symbolic coefficients and deltas (#149977) (PR #157738)
https://github.com/kasuga-fj edited https://github.com/llvm/llvm-project/pull/157738 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [libc++] Test triggering a benchmarking job comment (PR #158138)
ldionne wrote: /libcxx-bot benchmark libcxx/test/benchmarks/join_view.bench.cpp libcxx/test/benchmarks/hash.bench.cpp https://github.com/llvm/llvm-project/pull/158138 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Add IR and codegen support for deactivation symbols. (PR #133536)
https://github.com/pcc updated https://github.com/llvm/llvm-project/pull/133536 >From f4c61b403c8a2c649741bae983196922143db44e Mon Sep 17 00:00:00 2001 From: Peter Collingbourne Date: Wed, 10 Sep 2025 18:02:38 -0700 Subject: [PATCH 1/2] Tweak LangRef Created using spr 1.3.6-beta.1 --- llvm/docs/LangRef.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst index 10586f03cff8e..5380413aec892 100644 --- a/llvm/docs/LangRef.rst +++ b/llvm/docs/LangRef.rst @@ -3098,7 +3098,8 @@ Deactivation Symbol Operand Bundles A ``"deactivation-symbol"`` operand bundle is valid on the following instructions (AArch64 only): -- Call to a normal function with ``notail`` attribute. +- Call to a normal function with ``notail`` attribute and a first argument and + return value of type ``ptr``. - Call to ``llvm.ptrauth.sign`` or ``llvm.ptrauth.auth`` intrinsics. This operand bundle specifies that if the deactivation symbol is defined >From 0c2d97be43360d18f6e674bde048298a450a4bda Mon Sep 17 00:00:00 2001 From: Peter Collingbourne Date: Thu, 11 Sep 2025 12:39:33 -0700 Subject: [PATCH 2/2] Add combine check Created using spr 1.3.6-beta.1 --- .../InstCombine/InstCombineCalls.cpp | 10 +++ .../InstCombine/ptrauth-intrinsics.ll | 28 +++ 2 files changed, 38 insertions(+) diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp index 42b65dde67255..6550c6213dee5 100644 --- a/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp +++ b/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp @@ -3052,6 +3052,11 @@ Instruction *InstCombinerImpl::visitCallInst(CallInst &CI) { } case Intrinsic::ptrauth_auth: case Intrinsic::ptrauth_resign: { +// We don't support this optimization on intrinsic calls with deactivation +// symbols, which are represented using operand bundles. +if (II->hasOperandBundles()) + break; + // (sign|resign) + (auth|resign) can be folded by omitting the middle // sign+auth component if the key and discriminator match. bool NeedSign = II->getIntrinsicID() == Intrinsic::ptrauth_resign; @@ -3063,6 +3068,11 @@ Instruction *InstCombinerImpl::visitCallInst(CallInst &CI) { // whatever we replace this sequence with. Value *AuthKey = nullptr, *AuthDisc = nullptr, *BasePtr; if (const auto *CI = dyn_cast(Ptr)) { + // We don't support this optimization on intrinsic calls with deactivation + // symbols, which are represented using operand bundles. + if (CI->hasOperandBundles()) +break; + BasePtr = CI->getArgOperand(0); if (CI->getIntrinsicID() == Intrinsic::ptrauth_sign) { if (CI->getArgOperand(1) != Key || CI->getArgOperand(2) != Disc) diff --git a/llvm/test/Transforms/InstCombine/ptrauth-intrinsics.ll b/llvm/test/Transforms/InstCombine/ptrauth-intrinsics.ll index 208e162ac9416..09d9649b09cc1 100644 --- a/llvm/test/Transforms/InstCombine/ptrauth-intrinsics.ll +++ b/llvm/test/Transforms/InstCombine/ptrauth-intrinsics.ll @@ -160,6 +160,34 @@ define i64 @test_ptrauth_resign_ptrauth_constant(ptr %p) { ret i64 %authed } +@ds = external global i8 + +define i64 @test_ptrauth_nop_ds1(ptr %p) { +; CHECK-LABEL: @test_ptrauth_nop_ds1( +; CHECK-NEXT:[[TMP0:%.*]] = ptrtoint ptr [[P:%.*]] to i64 +; CHECK-NEXT:[[SIGNED:%.*]] = call i64 @llvm.ptrauth.sign(i64 [[TMP0]], i32 1, i64 1234) [ "deactivation-symbol"(ptr @ds) ] +; CHECK-NEXT:[[AUTHED:%.*]] = call i64 @llvm.ptrauth.auth(i64 [[SIGNED]], i32 1, i64 1234) +; CHECK-NEXT:ret i64 [[AUTHED]] +; + %tmp0 = ptrtoint ptr %p to i64 + %signed = call i64 @llvm.ptrauth.sign(i64 %tmp0, i32 1, i64 1234) [ "deactivation-symbol"(ptr @ds) ] + %authed = call i64 @llvm.ptrauth.auth(i64 %signed, i32 1, i64 1234) + ret i64 %authed +} + +define i64 @test_ptrauth_nop_ds2(ptr %p) { +; CHECK-LABEL: @test_ptrauth_nop_ds2( +; CHECK-NEXT:[[TMP0:%.*]] = ptrtoint ptr [[P:%.*]] to i64 +; CHECK-NEXT:[[SIGNED:%.*]] = call i64 @llvm.ptrauth.sign(i64 [[TMP0]], i32 1, i64 1234) +; CHECK-NEXT:[[AUTHED:%.*]] = call i64 @llvm.ptrauth.auth(i64 [[SIGNED]], i32 1, i64 1234) [ "deactivation-symbol"(ptr @ds) ] +; CHECK-NEXT:ret i64 [[AUTHED]] +; + %tmp0 = ptrtoint ptr %p to i64 + %signed = call i64 @llvm.ptrauth.sign(i64 %tmp0, i32 1, i64 1234) + %authed = call i64 @llvm.ptrauth.auth(i64 %signed, i32 1, i64 1234) [ "deactivation-symbol"(ptr @ds) ] + ret i64 %authed +} + declare i64 @llvm.ptrauth.auth(i64, i32, i64) declare i64 @llvm.ptrauth.sign(i64, i32, i64) declare i64 @llvm.ptrauth.resign(i64, i32, i64, i32, i64) ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Add deactivation symbol operand to ConstantPtrAuth. (PR #133537)
https://github.com/pcc updated https://github.com/llvm/llvm-project/pull/133537 >From e728f3444624a5f47f0af84c21fb3a584f3e05b7 Mon Sep 17 00:00:00 2001 From: Peter Collingbourne Date: Fri, 1 Aug 2025 17:27:41 -0700 Subject: [PATCH 1/5] Add verifier check Created using spr 1.3.6-beta.1 --- llvm/lib/IR/Verifier.cpp | 5 + llvm/test/Verifier/ptrauth-constant.ll | 6 ++ 2 files changed, 11 insertions(+) create mode 100644 llvm/test/Verifier/ptrauth-constant.ll diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp index 3ff9895e161c4..3478c2c450ae7 100644 --- a/llvm/lib/IR/Verifier.cpp +++ b/llvm/lib/IR/Verifier.cpp @@ -2627,6 +2627,11 @@ void Verifier::visitConstantPtrAuth(const ConstantPtrAuth *CPA) { Check(CPA->getDiscriminator()->getBitWidth() == 64, "signed ptrauth constant discriminator must be i64 constant integer"); + + Check(isa(CPA->getDeactivationSymbol()) || +CPA->getDeactivationSymbol()->isNullValue(), +"signed ptrauth constant deactivation symbol must be a global value " +"or null"); } bool Verifier::verifyAttributeCount(AttributeList Attrs, unsigned Params) { diff --git a/llvm/test/Verifier/ptrauth-constant.ll b/llvm/test/Verifier/ptrauth-constant.ll new file mode 100644 index 0..fdd6352cf8469 --- /dev/null +++ b/llvm/test/Verifier/ptrauth-constant.ll @@ -0,0 +1,6 @@ +; RUN: not opt -passes=verify < %s 2>&1 | FileCheck %s + +@g = external global i8 + +; CHECK: signed ptrauth constant deactivation symbol must be a global variable or null +@ptr = global ptr ptrauth (ptr @g, i32 0, i64 65535, ptr null, ptr inttoptr (i64 16 to ptr)) >From 60e836e71bf9aabe9dade2bda1ca38107f76b599 Mon Sep 17 00:00:00 2001 From: Peter Collingbourne Date: Mon, 8 Sep 2025 17:34:59 -0700 Subject: [PATCH 2/5] Address review comment Created using spr 1.3.6-beta.1 --- llvm/lib/IR/Constants.cpp | 1 + llvm/test/Assembler/invalid-ptrauth-const6.ll | 6 ++ 2 files changed, 7 insertions(+) create mode 100644 llvm/test/Assembler/invalid-ptrauth-const6.ll diff --git a/llvm/lib/IR/Constants.cpp b/llvm/lib/IR/Constants.cpp index 5eacc7af1269b..53b292f90c03d 100644 --- a/llvm/lib/IR/Constants.cpp +++ b/llvm/lib/IR/Constants.cpp @@ -2082,6 +2082,7 @@ ConstantPtrAuth::ConstantPtrAuth(Constant *Ptr, ConstantInt *Key, assert(Key->getBitWidth() == 32); assert(Disc->getBitWidth() == 64); assert(AddrDisc->getType()->isPointerTy()); + assert(DeactivationSymbol->getType()->isPointerTy()); setOperand(0, Ptr); setOperand(1, Key); setOperand(2, Disc); diff --git a/llvm/test/Assembler/invalid-ptrauth-const6.ll b/llvm/test/Assembler/invalid-ptrauth-const6.ll new file mode 100644 index 0..6e8e1d386acc8 --- /dev/null +++ b/llvm/test/Assembler/invalid-ptrauth-const6.ll @@ -0,0 +1,6 @@ +; RUN: not llvm-as < %s 2>&1 | FileCheck %s + +@var = global i32 0 + +; CHECK: error: constant ptrauth deactivation symbol must be a pointer +@ptr = global ptr ptrauth (ptr @var, i32 0, i64 65535, ptr null, i64 0) >From a780d181fa69236d5909759a24a1134b50313980 Mon Sep 17 00:00:00 2001 From: Peter Collingbourne Date: Tue, 9 Sep 2025 17:18:49 -0700 Subject: [PATCH 3/5] Address review comment Created using spr 1.3.6-beta.1 --- llvm/lib/Bitcode/Reader/BitcodeReader.cpp | 3 +++ llvm/lib/IR/Verifier.cpp | 3 +++ 2 files changed, 6 insertions(+) diff --git a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp index 045ed204620fb..04fe4c57af6ed 100644 --- a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp +++ b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp @@ -1613,6 +1613,9 @@ Expected BitcodeReader::materializeValue(unsigned StartValID, ConstOps.size() > 4 ? ConstOps[4] : ConstantPointerNull::get(cast( ConstOps[3]->getType())); + if (DeactivationSymbol->getType()->isPointerTy()) +return error( +"ptrauth deactivation symbol operand must be a pointer"); C = ConstantPtrAuth::get(ConstOps[0], Key, Disc, ConstOps[3], DeactivationSymbol); diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp index 9e44dfb387615..a53ba17e26011 100644 --- a/llvm/lib/IR/Verifier.cpp +++ b/llvm/lib/IR/Verifier.cpp @@ -2632,6 +2632,9 @@ void Verifier::visitConstantPtrAuth(const ConstantPtrAuth *CPA) { Check(CPA->getDiscriminator()->getBitWidth() == 64, "signed ptrauth constant discriminator must be i64 constant integer"); + Check(CPA->getDeactivationSymbol()->getType()->isPointerTy(), +"signed ptrauth constant deactivation symbol must be a pointer"); + Check(isa(CPA->getDeactivationSymbol()) || CPA->getDeactivationSymbol()->isNullValue(), "signed ptrauth constant deactivation symbol must be a global value " >From 51c353bbde24f940e3dfd7488aec0682dbef260b Mon Se
[llvm-branch-commits] [llvm] Add deactivation symbol operand to ConstantPtrAuth. (PR #133537)
https://github.com/pcc updated https://github.com/llvm/llvm-project/pull/133537 >From e728f3444624a5f47f0af84c21fb3a584f3e05b7 Mon Sep 17 00:00:00 2001 From: Peter Collingbourne Date: Fri, 1 Aug 2025 17:27:41 -0700 Subject: [PATCH 1/5] Add verifier check Created using spr 1.3.6-beta.1 --- llvm/lib/IR/Verifier.cpp | 5 + llvm/test/Verifier/ptrauth-constant.ll | 6 ++ 2 files changed, 11 insertions(+) create mode 100644 llvm/test/Verifier/ptrauth-constant.ll diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp index 3ff9895e161c4..3478c2c450ae7 100644 --- a/llvm/lib/IR/Verifier.cpp +++ b/llvm/lib/IR/Verifier.cpp @@ -2627,6 +2627,11 @@ void Verifier::visitConstantPtrAuth(const ConstantPtrAuth *CPA) { Check(CPA->getDiscriminator()->getBitWidth() == 64, "signed ptrauth constant discriminator must be i64 constant integer"); + + Check(isa(CPA->getDeactivationSymbol()) || +CPA->getDeactivationSymbol()->isNullValue(), +"signed ptrauth constant deactivation symbol must be a global value " +"or null"); } bool Verifier::verifyAttributeCount(AttributeList Attrs, unsigned Params) { diff --git a/llvm/test/Verifier/ptrauth-constant.ll b/llvm/test/Verifier/ptrauth-constant.ll new file mode 100644 index 0..fdd6352cf8469 --- /dev/null +++ b/llvm/test/Verifier/ptrauth-constant.ll @@ -0,0 +1,6 @@ +; RUN: not opt -passes=verify < %s 2>&1 | FileCheck %s + +@g = external global i8 + +; CHECK: signed ptrauth constant deactivation symbol must be a global variable or null +@ptr = global ptr ptrauth (ptr @g, i32 0, i64 65535, ptr null, ptr inttoptr (i64 16 to ptr)) >From 60e836e71bf9aabe9dade2bda1ca38107f76b599 Mon Sep 17 00:00:00 2001 From: Peter Collingbourne Date: Mon, 8 Sep 2025 17:34:59 -0700 Subject: [PATCH 2/5] Address review comment Created using spr 1.3.6-beta.1 --- llvm/lib/IR/Constants.cpp | 1 + llvm/test/Assembler/invalid-ptrauth-const6.ll | 6 ++ 2 files changed, 7 insertions(+) create mode 100644 llvm/test/Assembler/invalid-ptrauth-const6.ll diff --git a/llvm/lib/IR/Constants.cpp b/llvm/lib/IR/Constants.cpp index 5eacc7af1269b..53b292f90c03d 100644 --- a/llvm/lib/IR/Constants.cpp +++ b/llvm/lib/IR/Constants.cpp @@ -2082,6 +2082,7 @@ ConstantPtrAuth::ConstantPtrAuth(Constant *Ptr, ConstantInt *Key, assert(Key->getBitWidth() == 32); assert(Disc->getBitWidth() == 64); assert(AddrDisc->getType()->isPointerTy()); + assert(DeactivationSymbol->getType()->isPointerTy()); setOperand(0, Ptr); setOperand(1, Key); setOperand(2, Disc); diff --git a/llvm/test/Assembler/invalid-ptrauth-const6.ll b/llvm/test/Assembler/invalid-ptrauth-const6.ll new file mode 100644 index 0..6e8e1d386acc8 --- /dev/null +++ b/llvm/test/Assembler/invalid-ptrauth-const6.ll @@ -0,0 +1,6 @@ +; RUN: not llvm-as < %s 2>&1 | FileCheck %s + +@var = global i32 0 + +; CHECK: error: constant ptrauth deactivation symbol must be a pointer +@ptr = global ptr ptrauth (ptr @var, i32 0, i64 65535, ptr null, i64 0) >From a780d181fa69236d5909759a24a1134b50313980 Mon Sep 17 00:00:00 2001 From: Peter Collingbourne Date: Tue, 9 Sep 2025 17:18:49 -0700 Subject: [PATCH 3/5] Address review comment Created using spr 1.3.6-beta.1 --- llvm/lib/Bitcode/Reader/BitcodeReader.cpp | 3 +++ llvm/lib/IR/Verifier.cpp | 3 +++ 2 files changed, 6 insertions(+) diff --git a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp index 045ed204620fb..04fe4c57af6ed 100644 --- a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp +++ b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp @@ -1613,6 +1613,9 @@ Expected BitcodeReader::materializeValue(unsigned StartValID, ConstOps.size() > 4 ? ConstOps[4] : ConstantPointerNull::get(cast( ConstOps[3]->getType())); + if (DeactivationSymbol->getType()->isPointerTy()) +return error( +"ptrauth deactivation symbol operand must be a pointer"); C = ConstantPtrAuth::get(ConstOps[0], Key, Disc, ConstOps[3], DeactivationSymbol); diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp index 9e44dfb387615..a53ba17e26011 100644 --- a/llvm/lib/IR/Verifier.cpp +++ b/llvm/lib/IR/Verifier.cpp @@ -2632,6 +2632,9 @@ void Verifier::visitConstantPtrAuth(const ConstantPtrAuth *CPA) { Check(CPA->getDiscriminator()->getBitWidth() == 64, "signed ptrauth constant discriminator must be i64 constant integer"); + Check(CPA->getDeactivationSymbol()->getType()->isPointerTy(), +"signed ptrauth constant deactivation symbol must be a pointer"); + Check(isa(CPA->getDeactivationSymbol()) || CPA->getDeactivationSymbol()->isNullValue(), "signed ptrauth constant deactivation symbol must be a global value " >From 51c353bbde24f940e3dfd7488aec0682dbef260b Mon Se
[llvm-branch-commits] [llvm] [libc++] Test triggering a benchmarking job comment (PR #158138)
ldionne wrote: /libcxx-bot benchmark libcxx/test/benchmarks/join_view.bench.cpp libcxx/test/benchmarks/hash.bench.cpp https://github.com/llvm/llvm-project/pull/158138 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [libc++] Test triggering a benchmarking job comment (PR #158138)
https://github.com/ldionne closed https://github.com/llvm/llvm-project/pull/158138 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Add deactivation symbol operand to ConstantPtrAuth. (PR #133537)
ojhunt wrote: > This isn't possible, the symbols are resolved at static link time. See the > RFC for more information: > https://discourse.llvm.org/t/rfc-deactivation-symbols/85556 Oh wait, I have completely misunderstood that - I have always assumed dynamic link and that's the reason for a bunch of the concerns I raised, that I now assume sounded really weird :D https://github.com/llvm/llvm-project/pull/133537 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopPeel] Fix branch weights' effect on block frequencies (PR #128785)
https://github.com/jdenny-ornl edited https://github.com/llvm/llvm-project/pull/128785 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Add deactivation symbol operand to ConstantPtrAuth. (PR #133537)
@@ -2135,6 +2135,11 @@ bool ConstantPtrAuth::hasSpecialAddressDiscriminator(uint64_t Value) const { bool ConstantPtrAuth::isKnownCompatibleWith(const Value *Key, const Value *Discriminator, const DataLayout &DL) const { + // This function may only be validly called to analyze a ptrauth operation with + // no deactivation symbol, so if we have one it isn't compatible. + if (!getDeactivationSymbol()->isNullValue()) ojhunt wrote: Sigh, IR vs clang again - I was thinking about this in the context of qualified type compatibility. Sigh. https://github.com/llvm/llvm-project/pull/133537 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] Add pointer field protection feature. (PR #133538)
ojhunt wrote: > @pcc and I have been discussing this. > > * The perf issues I was concerned about were predicated on access to a > pointer loaded from a field continuing to be checked after the original field > load, this is not the case (and in hindsight doing so would imply passing the > pointer as a parameter to a function would maintain the tag and require the > target knowing about it). For people following along, despite multiple different places saying the symbol resolution is static, I'm a muppet and thought this was a dynamic link check, hence had all sorts of problems. However it's a static link time check, so I'm a muppet and many of my concerns are irrelevant. https://github.com/llvm/llvm-project/pull/133538 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [Clang] Port ulimit tests to work with internal shell (PR #157977)
https://github.com/ilovepi approved this pull request. https://github.com/llvm/llvm-project/pull/157977 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [lit] Implement ulimit builtin (PR #157958)
https://github.com/ilovepi approved this pull request. https://github.com/llvm/llvm-project/pull/157958 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits