r374042 - [SVE][IR] Scalable Vector size queries and IR instruction support
Author: huntergr Date: Tue Oct 8 05:53:54 2019 New Revision: 374042 URL: http://llvm.org/viewvc/llvm-project?rev=374042&view=rev Log: [SVE][IR] Scalable Vector size queries and IR instruction support * Adds a TypeSize struct to represent the known minimum size of a type along with a flag to indicate that the runtime size is a integer multiple of that size * Converts existing size query functions from Type.h and DataLayout.h to return a TypeSize result * Adds convenience methods (including a transparent conversion operator to uint64_t) so that most existing code 'just works' as if the return values were still scalars. * Uses the new size queries along with ElementCount to ensure that all supported instructions used with scalable vectors can be constructed in IR. Reviewers: hfinkel, lattner, rkruppe, greened, rovka, rengolin, sdesmalen Reviewed By: rovka, sdesmalen Differential Revision: https://reviews.llvm.org/D53137 Modified: cfe/trunk/lib/CodeGen/CGCall.cpp cfe/trunk/lib/CodeGen/CGStmt.cpp cfe/trunk/lib/CodeGen/CodeGenFunction.cpp Modified: cfe/trunk/lib/CodeGen/CGCall.cpp URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CGCall.cpp?rev=374042&r1=374041&r2=374042&view=diff == --- cfe/trunk/lib/CodeGen/CGCall.cpp (original) +++ cfe/trunk/lib/CodeGen/CGCall.cpp Tue Oct 8 05:53:54 2019 @@ -4277,8 +4277,8 @@ RValue CodeGenFunction::EmitCall(const C // Update the largest vector width if any arguments have vector types. for (unsigned i = 0; i < IRCallArgs.size(); ++i) { if (auto *VT = dyn_cast(IRCallArgs[i]->getType())) - LargestVectorWidth = std::max(LargestVectorWidth, -VT->getPrimitiveSizeInBits()); + LargestVectorWidth = std::max((uint64_t)LargestVectorWidth, + VT->getPrimitiveSizeInBits().getFixedSize()); } // Compute the calling convention and attributes. @@ -4361,8 +4361,8 @@ RValue CodeGenFunction::EmitCall(const C // Update largest vector width from the return type. if (auto *VT = dyn_cast(CI->getType())) -LargestVectorWidth = std::max(LargestVectorWidth, - VT->getPrimitiveSizeInBits()); +LargestVectorWidth = std::max((uint64_t)LargestVectorWidth, + VT->getPrimitiveSizeInBits().getFixedSize()); // Insert instrumentation or attach profile metadata at indirect call sites. // For more details, see the comment before the definition of Modified: cfe/trunk/lib/CodeGen/CGStmt.cpp URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CGStmt.cpp?rev=374042&r1=374041&r2=374042&view=diff == --- cfe/trunk/lib/CodeGen/CGStmt.cpp (original) +++ cfe/trunk/lib/CodeGen/CGStmt.cpp Tue Oct 8 05:53:54 2019 @@ -2073,8 +2073,8 @@ void CodeGenFunction::EmitAsmStmt(const // Update largest vector width for any vector types. if (auto *VT = dyn_cast(ResultRegTypes.back())) -LargestVectorWidth = std::max(LargestVectorWidth, - VT->getPrimitiveSizeInBits()); +LargestVectorWidth = std::max((uint64_t)LargestVectorWidth, + VT->getPrimitiveSizeInBits().getFixedSize()); } else { ArgTypes.push_back(Dest.getAddress().getType()); Args.push_back(Dest.getPointer()); @@ -2098,8 +2098,8 @@ void CodeGenFunction::EmitAsmStmt(const // Update largest vector width for any vector types. if (auto *VT = dyn_cast(Arg->getType())) -LargestVectorWidth = std::max(LargestVectorWidth, - VT->getPrimitiveSizeInBits()); +LargestVectorWidth = std::max((uint64_t)LargestVectorWidth, + VT->getPrimitiveSizeInBits().getFixedSize()); if (Info.allowsRegister()) InOutConstraints += llvm::utostr(i); else @@ -2185,8 +2185,8 @@ void CodeGenFunction::EmitAsmStmt(const // Update largest vector width for any vector types. if (auto *VT = dyn_cast(Arg->getType())) - LargestVectorWidth = std::max(LargestVectorWidth, -VT->getPrimitiveSizeInBits()); + LargestVectorWidth = std::max((uint64_t)LargestVectorWidth, + VT->getPrimitiveSizeInBits().getFixedSize()); ArgTypes.push_back(Arg->getType()); Args.push_back(Arg); Modified: cfe/trunk/lib/CodeGen/CodeGenFunction.cpp URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CodeGenFunction.cpp?rev=374042&r1=374041&r2=374042&view=diff == --- cfe/trunk/lib/CodeGen/CodeGenFunction.cpp (original) +++ cfe/trunk/lib/CodeGen/CodeGenFunction.cpp Tue Oct 8 05:53:54 2019 @@ -431,13 +431,13 @@ void CodeGenFunction::F
[flang] [libc] [llvm] [clang-tools-extra] [clang] [lldb] [libcxx] [compiler-rt] [InstCombine] Fold converted urem to 0 if there's no overlapping bits (PR #71528)
https://github.com/huntergr-arm updated https://github.com/llvm/llvm-project/pull/71528 >From 754519ad9b37343c827504e7d6bfcfa590f69483 Mon Sep 17 00:00:00 2001 From: Graham Hunter Date: Fri, 3 Nov 2023 14:22:57 + Subject: [PATCH] [InstCombine] Fold converted urem to 0 if there's no overlapping bits When folding urem instructions we can end up not recognizing that the output will always be 0 due to Value*s being different, despite generating the same data (in this case, 2 different calls to vscale). This patch recognizes the (x << N) & (add (x << M), -1) pattern that instcombine replaces urem with after the two vscale calls have been reduced to one via CSE, then replaces with 0 when x is a non-zero power of 2 and N >= M. --- .../InstCombine/InstCombineAndOrXor.cpp | 10 .../InstCombine/po2-shift-add-and-to-zero.ll | 52 +++ 2 files changed, 62 insertions(+) create mode 100644 llvm/test/Transforms/InstCombine/po2-shift-add-and-to-zero.ll diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp index 46af9bf5eed003a..da38f8039dbc3ca 100644 --- a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp +++ b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp @@ -2662,6 +2662,16 @@ Instruction *InstCombinerImpl::visitAnd(BinaryOperator &I) { if (sinkNotIntoOtherHandOfLogicalOp(I)) return &I; + // (x << N) & (add (x << M), -1) --> 0, where x is known to be a non-zero + // power of 2 and M <= N. + const APInt *Shift1, *Shift2; + if (match(&I, m_c_And(m_OneUse(m_Shl(m_Value(X), m_APInt(Shift1))), +m_OneUse(m_Add(m_Shl(m_Value(Y), m_APInt(Shift2)), + m_AllOnes() && + X == Y && isKnownToBeAPowerOfTwo(X, /*OrZero*/ false, 0, &I) && + Shift1->uge(*Shift2)) +return replaceInstUsesWith(I, Constant::getNullValue(I.getType())); + // An and recurrence w/loop invariant step is equivelent to (and start, step) PHINode *PN = nullptr; Value *Start = nullptr, *Step = nullptr; diff --git a/llvm/test/Transforms/InstCombine/po2-shift-add-and-to-zero.ll b/llvm/test/Transforms/InstCombine/po2-shift-add-and-to-zero.ll new file mode 100644 index 000..4979e7a01972299 --- /dev/null +++ b/llvm/test/Transforms/InstCombine/po2-shift-add-and-to-zero.ll @@ -0,0 +1,52 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2 +; RUN: opt -mtriple unknown -passes=instcombine -S < %s | FileCheck %s + +;; The and X, (add Y, -1) pattern is from an earlier instcombine pass which +;; converted + +;; define dso_local i64 @f1() local_unnamed_addr #0 { +;; entry: +;; %0 = call i64 @llvm.aarch64.sve.cntb(i32 31) +;; %1 = call i64 @llvm.aarch64.sve.cnth(i32 31) +;; %rem = urem i64 %0, %1 +;; ret i64 %rem +;; } + +;; into + +;; define dso_local i64 @f1() local_unnamed_addr #0 { +;; entry: +;; %0 = call i64 @llvm.vscale.i64() +;; %1 = shl nuw nsw i64 %0, 4 +;; %2 = call i64 @llvm.vscale.i64() +;; %3 = shl nuw nsw i64 %2, 3 +;; %4 = add nsw i64 %3, -1 +;; %rem = and i64 %1, %4 +;; ret i64 %rem +;; } + +;; InstCombine would have folded the original to returning 0 if the vscale +;; calls were the same Value*, but since there's two of them it doesn't +;; work and we convert the urem to add/and. CSE then gets rid of the extra +;; vscale, leaving us with a new pattern to match. This only works because +;; vscale is known to be a nonzero power of 2 (assuming there's a defined +;; range for it). + +define dso_local i64 @f1() local_unnamed_addr #0 { +; CHECK-LABEL: define dso_local i64 @f1 +; CHECK-SAME: () local_unnamed_addr #[[ATTR0:[0-9]+]] { +; CHECK-NEXT: entry: +; CHECK-NEXT:ret i64 0 +; +entry: + %0 = call i64 @llvm.vscale.i64() + %1 = shl nuw nsw i64 %0, 4 + %2 = shl nuw nsw i64 %0, 3 + %3 = add nsw i64 %2, -1 + %rem = and i64 %1, %3 + ret i64 %rem +} + +declare i64 @llvm.vscale.i64() + +attributes #0 = { vscale_range(1,16) } ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] a83ce95 - [clang] Remove unused capture in closure
Author: Graham Hunter Date: 2021-06-22T15:09:39+01:00 New Revision: a83ce95b097689f4ebd2c3b45c0778d0b6946d00 URL: https://github.com/llvm/llvm-project/commit/a83ce95b097689f4ebd2c3b45c0778d0b6946d00 DIFF: https://github.com/llvm/llvm-project/commit/a83ce95b097689f4ebd2c3b45c0778d0b6946d00.diff LOG: [clang] Remove unused capture in closure c6a91ee6 removed uses of IsMonotonic from OpenMP SIMD codegen, but that left a capture of the variable unused which upset buildbots using -Werror. Added: Modified: clang/lib/CodeGen/CGStmtOpenMP.cpp Removed: diff --git a/clang/lib/CodeGen/CGStmtOpenMP.cpp b/clang/lib/CodeGen/CGStmtOpenMP.cpp index 3b2b70f388cc..ba497a5b9d3a 100644 --- a/clang/lib/CodeGen/CGStmtOpenMP.cpp +++ b/clang/lib/CodeGen/CGStmtOpenMP.cpp @@ -3197,7 +3197,7 @@ bool CodeGenFunction::EmitOMPWorksharingLoop( getJumpDestInCurrentScope(createBasicBlock("omp.loop.exit")); emitCommonSimdLoop( *this, S, -[&S, IsMonotonic](CodeGenFunction &CGF, PrePostActionTy &) { +[&S](CodeGenFunction &CGF, PrePostActionTy &) { if (isOpenMPSimdDirective(S.getDirectiveKind())) { CGF.EmitOMPSimdInit(S); } else if (const auto *C = S.getSingleClause()) { ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] ad49765 - [OpenMP] Allow const parameters in declare simd linear clause
Author: Graham Hunter Date: 2020-03-02T14:54:14Z New Revision: ad497658d25a3616e4c57cf7d12e3497a1c66f35 URL: https://github.com/llvm/llvm-project/commit/ad497658d25a3616e4c57cf7d12e3497a1c66f35 DIFF: https://github.com/llvm/llvm-project/commit/ad497658d25a3616e4c57cf7d12e3497a1c66f35.diff LOG: [OpenMP] Allow const parameters in declare simd linear clause Reviewers: ABataev, kkwli0, jdoerfert, fpetrogalli Reviewed By: ABataev, fpetrogalli Differential Revision: https://reviews.llvm.org/D75350 Added: Modified: clang/include/clang/Sema/Sema.h clang/lib/Sema/SemaOpenMP.cpp clang/test/OpenMP/declare_simd_aarch64.c clang/test/OpenMP/declare_simd_codegen.cpp Removed: diff --git a/clang/include/clang/Sema/Sema.h b/clang/include/clang/Sema/Sema.h index 2d7676e23cfc..f1dfe411983a 100644 --- a/clang/include/clang/Sema/Sema.h +++ b/clang/include/clang/Sema/Sema.h @@ -10173,7 +10173,8 @@ class Sema final { /// Checks that the specified declaration matches requirements for the linear /// decls. bool CheckOpenMPLinearDecl(const ValueDecl *D, SourceLocation ELoc, - OpenMPLinearClauseKind LinKind, QualType Type); + OpenMPLinearClauseKind LinKind, QualType Type, + bool IsDeclareSimd = false); /// Called on well-formed '\#pragma omp declare simd' after parsing of /// the associated method/function. diff --git a/clang/lib/Sema/SemaOpenMP.cpp b/clang/lib/Sema/SemaOpenMP.cpp index 6aaf70918d02..de732577c81b 100644 --- a/clang/lib/Sema/SemaOpenMP.cpp +++ b/clang/lib/Sema/SemaOpenMP.cpp @@ -5269,7 +5269,8 @@ Sema::DeclGroupPtrTy Sema::ActOnOpenMPDeclareSimdDirective( E->containsUnexpandedParameterPack()) continue; (void)CheckOpenMPLinearDecl(CanonPVD, E->getExprLoc(), LinKind, - PVD->getOriginalType()); + PVD->getOriginalType(), + /*IsDeclareSimd=*/true); continue; } } @@ -5289,7 +5290,7 @@ Sema::DeclGroupPtrTy Sema::ActOnOpenMPDeclareSimdDirective( E->isInstantiationDependent() || E->containsUnexpandedParameterPack()) continue; (void)CheckOpenMPLinearDecl(/*D=*/nullptr, E->getExprLoc(), LinKind, - E->getType()); + E->getType(), /*IsDeclareSimd=*/true); continue; } Diag(E->getExprLoc(), diag::err_omp_param_or_this_in_clause) @@ -14547,8 +14548,8 @@ bool Sema::CheckOpenMPLinearModifier(OpenMPLinearClauseKind LinKind, } bool Sema::CheckOpenMPLinearDecl(const ValueDecl *D, SourceLocation ELoc, - OpenMPLinearClauseKind LinKind, - QualType Type) { + OpenMPLinearClauseKind LinKind, QualType Type, + bool IsDeclareSimd) { const auto *VD = dyn_cast_or_null(D); // A variable must not have an incomplete type or a reference type. if (RequireCompleteType(ELoc, Type, diag::err_omp_linear_incomplete_type)) @@ -14564,8 +14565,10 @@ bool Sema::CheckOpenMPLinearDecl(const ValueDecl *D, SourceLocation ELoc, // OpenMP 5.0 [2.19.3, List Item Privatization, Restrictions] // A variable that is privatized must not have a const-qualified type // unless it is of class type with a mutable member. This restriction does - // not apply to the firstprivate clause. - if (rejectConstNotMutableType(*this, D, Type, OMPC_linear, ELoc)) + // not apply to the firstprivate clause, nor to the linear clause on + // declarative directives (like declare simd). + if (!IsDeclareSimd && + rejectConstNotMutableType(*this, D, Type, OMPC_linear, ELoc)) return true; // A list item must be of integral or pointer type. diff --git a/clang/test/OpenMP/declare_simd_aarch64.c b/clang/test/OpenMP/declare_simd_aarch64.c index eff0eed07dfe..4af2ad9bb603 100644 --- a/clang/test/OpenMP/declare_simd_aarch64.c +++ b/clang/test/OpenMP/declare_simd_aarch64.c @@ -116,6 +116,15 @@ double c02(double *x, char y); // AARCH64: "_ZGVnM16uv_c02" "_ZGVnM8uv_c02" // AARCH64-NOT: c02 +// +/* Linear with a constant parameter */ +// + +#pragma omp declare simd notinbranch linear(i) +double constlinear(const int i); +// AARCH64: "_ZGVnN2l_constlinear" "_ZGVnN4l_constlinear" +// AARCH64-NOT: constlinear + /*/ /* sincos-like signature */ /*/ @@ -170,6 +179,7 @@ void do_something() { D = b03(D); *I = c01(D, *S); *D = c02(D, *S); + constlinear(*I); sincos(*D, D, D); SinCos(*D, D, D); foo2(I, *I); diff --git a/clang/test/OpenMP/declare_simd_codegen.cpp b/clang/test/OpenMP/declare_simd_co
[clang] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)
@@ -2778,6 +2808,60 @@ void VPWidenPointerInductionRecipe::print(raw_ostream &O, const Twine &Indent, } #endif +void VPAliasLaneMaskRecipe::execute(VPTransformState &State) { + IRBuilderBase Builder = State.Builder; + Value *SinkValue = State.get(getSinkValue(), 0, true); + Value *SourceValue = State.get(getSourceValue(), 0, true); + + unsigned ElementSize = 0; + auto *ReadInsn = cast(SourceValue); + auto *ReadCast = dyn_cast(SourceValue); + if (ReadInsn->getOpcode() == Instruction::Add) +ReadCast = dyn_cast(ReadInsn->getOperand(0)); + + if (ReadCast && ReadCast->getOpcode() == Instruction::PtrToInt) { +Value *Ptr = ReadCast->getOperand(0); +for (auto *Use : Ptr->users()) { + if (auto *GEP = dyn_cast(Use)) { +auto *EltVT = GEP->getSourceElementType(); +if (EltVT->isArrayTy()) + ElementSize = EltVT->getArrayElementType()->getScalarSizeInBits() * +EltVT->getArrayNumElements(); +else + ElementSize = GEP->getSourceElementType()->getScalarSizeInBits() / 8; +break; + } +} + } + assert(ElementSize > 0 && "Couldn't get element size from pointer"); huntergr-arm wrote: This assert fires when compiling `MicroBenchmarks/ImageProcessing/Dilate/CMakeFiles/Dilate.dir/dilateKernel.c` from the test suite, so that needs fixing at least. But I think we shouldn't be trying to figure out the element size at recipe execution time; it should be determined when trying to build the recipe and the value stored as part of the class. If we can't find the size then we stop trying to build the recipe and fall back to normal RT checks. Does that sound sensible to you? https://github.com/llvm/llvm-project/pull/100579 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64][SVE] Refactor getPTrue to return splat(1) when pattern=all. (PR #139236)
@@ -25030,7 +25030,8 @@ static SDValue foldCSELofLASTB(SDNode *Op, SelectionDAG &DAG) { if (AnyPred.getOpcode() == AArch64ISD::REINTERPRET_CAST) AnyPred = AnyPred.getOperand(0); - if (TruePred != AnyPred && TruePred.getOpcode() != AArch64ISD::PTRUE) + if (TruePred != AnyPred && TruePred.getOpcode() != AArch64ISD::PTRUE && + !ISD::isConstantSplatVectorAllOnes(TruePred.getNode())) huntergr-arm wrote: I agree with Paul that we can remove the opcode check; I have a vague memory of looking for a helper function to determine whether the mask was all-true (or better, whether the ptest was redundant), not finding one, and just sticking a check for PTRUE in to get my tests up and running. Sadly I didn't revisit that before committing. https://github.com/llvm/llvm-project/pull/139236 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [LLVM][IRBuilder] Use NUW arithmetic for Create{ElementCount,TypeSize}. (PR #143532)
https://github.com/huntergr-arm approved this pull request. LGTM; I do think it's worth documenting the assumption/requirement about a sufficiently large type in the comments on the declarations for CreateElementCount and CreateTypeSize. https://github.com/llvm/llvm-project/pull/143532 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits