[clang] 38d18d9 - [SVE] Add support to vectorize_width loop pragma for scalable vectors

2021-01-08 Thread David Sherwood via cfe-commits

Author: David Sherwood
Date: 2021-01-08T11:37:27Z
New Revision: 38d18d93534d290d045bbbfa86337e70f1139dc2

URL: 
https://github.com/llvm/llvm-project/commit/38d18d93534d290d045bbbfa86337e70f1139dc2
DIFF: 
https://github.com/llvm/llvm-project/commit/38d18d93534d290d045bbbfa86337e70f1139dc2.diff

LOG: [SVE] Add support to vectorize_width loop pragma for scalable vectors

This patch adds support for two new variants of the vectorize_width
pragma:

1. vectorize_width(X[, fixed|scalable]) where an optional second
parameter is passed to the vectorize_width pragma, which indicates if
the user wishes to use fixed width or scalable vectorization. For
example the user can now write something like:

  #pragma clang loop vectorize_width(4, fixed)
or
  #pragma clang loop vectorize_width(4, scalable)

In the absence of a second parameter it is assumed the user wants
fixed width vectorization, in order to maintain compatibility with
existing code.
2. vectorize_width(fixed|scalable) where the width is left unspecified,
but the user hints what type of vectorization they prefer, either
fixed width or scalable.

I have implemented this by making use of the LLVM loop hint attribute:

  llvm.loop.vectorize.scalable.enable

Tests were added to

  clang/test/CodeGenCXX/pragma-loop.cpp

for both the 'fixed' and 'scalable' optional parameter.

See this thread for context: 
http://lists.llvm.org/pipermail/cfe-dev/2020-November/067262.html

Differential Revision: https://reviews.llvm.org/D89031

Added: 


Modified: 
clang/docs/LanguageExtensions.rst
clang/include/clang/Basic/Attr.td
clang/include/clang/Basic/DiagnosticParseKinds.td
clang/lib/AST/AttrImpl.cpp
clang/lib/CodeGen/CGLoopInfo.cpp
clang/lib/CodeGen/CGLoopInfo.h
clang/lib/Parse/ParsePragma.cpp
clang/lib/Sema/SemaStmtAttr.cpp
clang/test/AST/ast-print-pragmas.cpp
clang/test/CodeGenCXX/pragma-loop-pr27643.cpp
clang/test/CodeGenCXX/pragma-loop.cpp
clang/test/Parser/pragma-loop.cpp

Removed: 




diff  --git a/clang/docs/LanguageExtensions.rst 
b/clang/docs/LanguageExtensions.rst
index 0c01a2bbc52b..6fa6c55b15fc 100644
--- a/clang/docs/LanguageExtensions.rst
+++ b/clang/docs/LanguageExtensions.rst
@@ -3107,8 +3107,18 @@ manually enable vectorization or interleaving.
 ...
   }
 
-The vector width is specified by ``vectorize_width(_value_)`` and the 
interleave
-count is specified by ``interleave_count(_value_)``, where
+The vector width is specified by
+``vectorize_width(_value_[, fixed|scalable])``, where _value_ is a positive
+integer and the type of vectorization can be specified with an optional
+second parameter. The default for the second parameter is 'fixed' and
+refers to fixed width vectorization, whereas 'scalable' indicates the
+compiler should use scalable vectors instead. Another use of vectorize_width
+is ``vectorize_width(fixed|scalable)`` where the user can hint at the type
+of vectorization to use without specifying the exact width. In both variants
+of the pragma the vectorizer may decide to fall back on fixed width
+vectorization if the target does not support scalable vectors.
+
+The interleave count is specified by ``interleave_count(_value_)``, where
 _value_ is a positive integer. This is useful for specifying the optimal
 width/count of the set of target architectures supported by your application.
 

diff  --git a/clang/include/clang/Basic/Attr.td 
b/clang/include/clang/Basic/Attr.td
index b84e6a14f371..248409946123 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -3375,8 +3375,10 @@ def LoopHint : Attr {
"PipelineDisabled", "PipelineInitiationInterval", 
"Distribute",
"VectorizePredicate"]>,
   EnumArgument<"State", "LoopHintState",
-   ["enable", "disable", "numeric", "assume_safety", 
"full"],
-   ["Enable", "Disable", "Numeric", "AssumeSafety", 
"Full"]>,
+   ["enable", "disable", "numeric", "fixed_width",
+"scalable_width", "assume_safety", "full"],
+   ["Enable", "Disable", "Numeric", "FixedWidth",
+"ScalableWidth", "AssumeSafety", "Full"]>,
   ExprArgument<"Value">];
 
   let AdditionalMembers = [{

diff  --git a/clang/include/clang/Basic/DiagnosticParseKinds.td 
b/clang/include/clang/Basic/DiagnosticParseKinds.td
index 8f78bbfc4e70..0ed80a481e78 100644
--- a/clang/include/clang/Basic/DiagnosticParseKinds.td
+++ b/clang/include/clang/Basic/DiagnosticParseKinds.td
@@ -1396,6 +1396,12 @@ def err_pragma_loop_invalid_option : Error<
   "%select{invalid|missing}0 option%select{ %1|}0; expected vectorize, "
   "vectorize_width, interleave, interleave_count, unroll, unroll_count, "
   "pipeline, pipeline_initiation_interval, vectorize_predicate, or 

[clang] f4257c5 - [SVE] Make ElementCount members private

2020-08-28 Thread David Sherwood via cfe-commits

Author: David Sherwood
Date: 2020-08-28T14:43:53+01:00
New Revision: f4257c5832aa51e960e7351929ca3d37031985b7

URL: 
https://github.com/llvm/llvm-project/commit/f4257c5832aa51e960e7351929ca3d37031985b7
DIFF: 
https://github.com/llvm/llvm-project/commit/f4257c5832aa51e960e7351929ca3d37031985b7.diff

LOG: [SVE] Make ElementCount members private

This patch changes ElementCount so that the Min and Scalable
members are now private and can only be accessed via the get
functions getKnownMinValue() and isScalable(). In addition I've
added some other member functions for more commonly used operations.
Hopefully this makes the class more useful and will reduce the
need for calling getKnownMinValue().

Differential Revision: https://reviews.llvm.org/D86065

Added: 


Modified: 
clang/lib/CodeGen/CGBuiltin.cpp
clang/lib/CodeGen/CGDebugInfo.cpp
clang/lib/CodeGen/CodeGenTypes.cpp
llvm/include/llvm/Analysis/TargetTransformInfo.h
llvm/include/llvm/Analysis/VectorUtils.h
llvm/include/llvm/CodeGen/ValueTypes.h
llvm/include/llvm/IR/DataLayout.h
llvm/include/llvm/IR/DerivedTypes.h
llvm/include/llvm/IR/Instructions.h
llvm/include/llvm/Support/MachineValueType.h
llvm/include/llvm/Support/TypeSize.h
llvm/lib/Analysis/InstructionSimplify.cpp
llvm/lib/Analysis/VFABIDemangling.cpp
llvm/lib/Analysis/ValueTracking.cpp
llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
llvm/lib/CodeGen/CodeGenPrepare.cpp
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
llvm/lib/CodeGen/TargetLoweringBase.cpp
llvm/lib/CodeGen/ValueTypes.cpp
llvm/lib/IR/AsmWriter.cpp
llvm/lib/IR/ConstantFold.cpp
llvm/lib/IR/Constants.cpp
llvm/lib/IR/Core.cpp
llvm/lib/IR/DataLayout.cpp
llvm/lib/IR/Function.cpp
llvm/lib/IR/IRBuilder.cpp
llvm/lib/IR/Instructions.cpp
llvm/lib/IR/IntrinsicInst.cpp
llvm/lib/IR/Type.cpp
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
llvm/lib/Transforms/Utils/FunctionComparator.cpp
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
llvm/lib/Transforms/Vectorize/VPlan.cpp
llvm/lib/Transforms/Vectorize/VPlan.h
llvm/unittests/CodeGen/ScalableVectorMVTsTest.cpp
llvm/unittests/IR/VectorTypesTest.cpp

Removed: 




diff  --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 69899fd8e668..1192fbdc1c9d 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -8457,7 +8457,8 @@ Value 
*CodeGenFunction::EmitAArch64SVEBuiltinExpr(unsigned BuiltinID,
   case SVE::BI__builtin_sve_svlen_u64: {
 SVETypeFlags TF(Builtin->TypeModifier);
 auto VTy = cast(getSVEType(TF));
-auto NumEls = llvm::ConstantInt::get(Ty, VTy->getElementCount().Min);
+auto *NumEls =
+llvm::ConstantInt::get(Ty, VTy->getElementCount().getKnownMinValue());
 
 Function *F = CGM.getIntrinsic(Intrinsic::vscale, Ty);
 return Builder.CreateMul(NumEls, Builder.CreateCall(F));

diff  --git a/clang/lib/CodeGen/CGDebugInfo.cpp 
b/clang/lib/CodeGen/CGDebugInfo.cpp
index d90cffb4bb95..8a85a24910e4 100644
--- a/clang/lib/CodeGen/CGDebugInfo.cpp
+++ b/clang/lib/CodeGen/CGDebugInfo.cpp
@@ -726,7 +726,7 @@ llvm::DIType *CGDebugInfo::CreateType(const BuiltinType 
*BT) {
 {
   ASTContext::BuiltinVectorTypeInfo Info =
   CGM.getContext().getBuiltinVectorTypeInfo(BT);
-  unsigned NumElemsPerVG = (Info.EC.Min * Info.NumVectors) / 2;
+  unsigned NumElemsPerVG = (Info.EC.getKnownMinValue() * Info.NumVectors) 
/ 2;
 
   // Debuggers can't extract 1bit from a vector, so will display a
   // bitpattern for svbool_t instead.

diff  --git a/clang/lib/CodeGen/CodeGenTypes.cpp 
b/clang/lib/CodeGen/CodeGenTypes.cpp
index 9c072d416075..aede8a53ba90 100644
--- a/clang/lib/CodeGen/CodeGenTypes.cpp
+++ b/clang/lib/CodeGen/CodeGenTypes.cpp
@@ -586,7 +586,8 @@ llvm::Type *CodeGenTypes::ConvertType(QualType T) {
   ASTContext::BuiltinVectorTypeInfo Info =
   Context.getBuiltinVectorTypeInfo(cast(Ty));
   return llvm::ScalableVectorType::get(ConvertType(Info.ElementType),
-   Info.EC.Min * Info.NumVectors);
+   Info.EC.getKnownMinValue() *
+   Info.NumVectors);
 }
 case BuiltinType::Dependent:
 #define BUILTIN_TYPE(Id, SingletonId)

diff  --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h 
b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index a3e624842700..ffbec74c61d0 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -130,8 +130,8 @@ class IntrinsicCostAttributes {
   unsigned Fact

[clang] ae47d15 - Remove "rm -f" workaround in acle_sve_adda.c

2020-06-26 Thread David Sherwood via cfe-commits

Author: David Sherwood
Date: 2020-06-26T08:16:40+01:00
New Revision: ae47d158a096abad43d8f9056518d83b66c5a4b7

URL: 
https://github.com/llvm/llvm-project/commit/ae47d158a096abad43d8f9056518d83b66c5a4b7
DIFF: 
https://github.com/llvm/llvm-project/commit/ae47d158a096abad43d8f9056518d83b66c5a4b7.diff

LOG: Remove "rm -f" workaround in acle_sve_adda.c

Added: 


Modified: 
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_adda.c

Removed: 




diff  --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_adda.c 
b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_adda.c
index 9d9c33a891cd..853da8783faa 100644
--- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_adda.c
+++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_adda.c
@@ -1,5 +1,4 @@
 // REQUIRES: aarch64-registered-target
-// RUN: rm -f -- %S/acle_sve_adda.s
 // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve 
-fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | 
FileCheck %s
 // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu 
-target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall 
-emit-llvm -o - %s | FileCheck %s
 // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve 
-fallow-half-arguments-and-returns -S -O1 -Werror -Wall -o - %s >/dev/null 2>%t



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] c02332a - [CodeGen] Fix warning in getNode for EXTRACT_SUBVECTOR

2020-06-30 Thread David Sherwood via cfe-commits

Author: David Sherwood
Date: 2020-06-30T08:11:41+01:00
New Revision: c02332a69399a82244298f0097bc98fafdeb3042

URL: 
https://github.com/llvm/llvm-project/commit/c02332a69399a82244298f0097bc98fafdeb3042
DIFF: 
https://github.com/llvm/llvm-project/commit/c02332a69399a82244298f0097bc98fafdeb3042.diff

LOG: [CodeGen] Fix warning in getNode for EXTRACT_SUBVECTOR

Fix a warning in getNode() when extracting a subvector from a
concat vector. We can simply replace the call to getVectorNumElements
with getVectorMinNumElements as this follows the defined behaviour
for EXTRACT_SUBVECTOR.

Differential Revision: https://reviews.llvm.org/D82746

Added: 


Modified: 
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_get2.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_get3.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_get4.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_st2.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_st3.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_st4.c
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

Removed: 




diff  --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_get2.c 
b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_get2.c
index 788bad9022b5..7beb191cab30 100644
--- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_get2.c
+++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_get2.c
@@ -1,6 +1,11 @@
+// REQUIRES: aarch64-registered-target
 // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve 
-fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | 
FileCheck %s
 // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu 
-target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall 
-emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve 
-fallow-half-arguments-and-returns -S -O1 -Werror -Wall -o - %s >/dev/null 2>%t
+// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t
 
+// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README 
for instructions on how to resolve it.
+// ASM-NOT: warning
 #include 
 
 #ifdef SVE_OVERLOADED_FORMS

diff  --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_get3.c 
b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_get3.c
index 502f22d84210..63e17c3e1e0f 100644
--- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_get3.c
+++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_get3.c
@@ -1,6 +1,11 @@
+// REQUIRES: aarch64-registered-target
 // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve 
-fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | 
FileCheck %s
 // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu 
-target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall 
-emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve 
-fallow-half-arguments-and-returns -S -O1 -Werror -Wall -o - %s >/dev/null 2>%t
+// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t
 
+// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README 
for instructions on how to resolve it.
+// ASM-NOT: warning
 #include 
 
 #ifdef SVE_OVERLOADED_FORMS

diff  --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_get4.c 
b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_get4.c
index 399fa187e83a..a34f41ff3b40 100644
--- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_get4.c
+++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_get4.c
@@ -1,6 +1,11 @@
+// REQUIRES: aarch64-registered-target
 // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve 
-fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s | 
FileCheck %s
 // RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu 
-target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall 
-emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve 
-fallow-half-arguments-and-returns -S -O1 -Werror -Wall -o - %s >/dev/null 2>%t
+// RUN: FileCheck --check-prefix=ASM --allow-empty %s <%t
 
+// If this check fails please read test/CodeGen/aarch64-sve-intrinsics/README 
for instructions on how to resolve it.
+// ASM-NOT: warning
 #include 
 
 #ifdef SVE_OVERLOADED_FORMS

diff  --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_st2.c 
b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_st2.c
index 7170756d7a98..de21c59bb3b7 100644
--- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_st2.c
+++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_st2.c
@@ -1,6 +1,11 @@
+// REQUIRES: aarch64-registered-target
 // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve 
-fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o

[clang] 9a1a7d8 - [SVE] Add more warnings checks to clang and LLVM SVE tests

2020-07-07 Thread David Sherwood via cfe-commits

Author: David Sherwood
Date: 2020-07-07T09:33:20+01:00
New Revision: 9a1a7d888b53ebe5a934a8193de37da86e276f1e

URL: 
https://github.com/llvm/llvm-project/commit/9a1a7d888b53ebe5a934a8193de37da86e276f1e
DIFF: 
https://github.com/llvm/llvm-project/commit/9a1a7d888b53ebe5a934a8193de37da86e276f1e.diff

LOG: [SVE] Add more warnings checks to clang and LLVM SVE tests

There are now more SVE tests in LLVM and Clang that do not
emit warnings related to invalid use of EVT::getVectorNumElements()
and VectorType::getNumElements(). For these tests I have added
additional checks that there are no warnings in order to prevent
any future regressions.

Differential Revision: https://reviews.llvm.org/D82943

Added: 


Modified: 
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_acge.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_acgt.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_acle.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_aclt.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpeq.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpge.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpgt.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmple.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmplt.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpne.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cmpuo.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_dup.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_dupq.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_index.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1sb.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1sh.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1sw.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1ub.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1uh.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1uw.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1sb.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1sh.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1sw.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1ub.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1uh.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldff1uw.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1sb.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1sh.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1sw.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1ub.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1uh.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ldnf1uw.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_pnext.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ptrue.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_rev.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_setffr.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn1.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_trn2.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_undef.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_unpkhi.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_unpklo.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp1.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_uzp2.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_whilele.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_whilelt.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip1.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_zip2.c
llvm/test/CodeGen/AArch64/sve-callbyref-notailcall.ll
llvm/test/CodeGen/AArch64/sve-calling-convention-byref.ll
llvm/test/CodeGen/AArch64/sve-fcmp.ll
llvm/test/CodeGen/AArch64/sve-gather-scatter-dag-combine.ll
llvm/test/CodeGen/AArch64/sve-gep.ll
llvm/test/CodeGen/AArch64/sve-int-arith-imm.ll

llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-32bit-scaled-offsets.ll

llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-32bit-unscaled-offsets.ll

llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-64bit-scaled-offset.ll

llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-64bit-unscaled-offset.ll

llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-vector-base-imm-offset.ll

llvm/test/CodeGen/AArch64/sve-intrinsics-ff-gather-loads-vector-base-scalar-offset.ll

llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-32bit-scaled-offsets.ll

llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-32bit-unscaled-offsets.ll
llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-64bit-scaled-offset.ll

llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-64bit-unscaled-offset.ll

llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-vector-base-imm-offset.ll

llvm/test/CodeGen/AArch64/sve

[clang] bafdd11 - [SVE] Replace / operator in TypeSize/ElementCount with divideCoefficientBy

2020-09-28 Thread David Sherwood via cfe-commits

Author: David Sherwood
Date: 2020-09-28T08:03:00+01:00
New Revision: bafdd11326a46421b68f68c794fd189c77a32e15

URL: 
https://github.com/llvm/llvm-project/commit/bafdd11326a46421b68f68c794fd189c77a32e15
DIFF: 
https://github.com/llvm/llvm-project/commit/bafdd11326a46421b68f68c794fd189c77a32e15.diff

LOG: [SVE] Replace / operator in TypeSize/ElementCount with divideCoefficientBy

After some recent upstream discussion we decided that it was best
to avoid having the / operator for both ElementCount and TypeSize,
since this could give the impression that these classes can be used
in the same way as basic integer integer types. However, division
for scalable types is a bit odd because we are only dividing the
minimum quantity by a value, as opposed to something like:

  (MinSize * Vscale) / SomeValue

This is why when performing division it's important the caller
first establishes whether the operation makes sense, perhaps by
calling isKnownMultipleOf() prior to division. The caller must now
explictly call divideCoefficientBy() on the class to perform the
operation.

Differential Revision: https://reviews.llvm.org/D87700

Added: 


Modified: 
clang/lib/CodeGen/CGBuiltin.cpp
llvm/include/llvm/CodeGen/ValueTypes.h
llvm/include/llvm/IR/DerivedTypes.h
llvm/include/llvm/Support/MachineValueType.h
llvm/include/llvm/Support/TypeSize.h
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
llvm/lib/CodeGen/TargetLoweringBase.cpp
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
llvm/unittests/CodeGen/ScalableVectorMVTsTest.cpp
llvm/unittests/IR/VectorTypesTest.cpp

Removed: 




diff  --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 498d8640f329..102b5aefe8ff 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -5650,7 +5650,7 @@ Value *CodeGenFunction::EmitCommonNeonBuiltinExpr(
 if (BuiltinID == NEON::BI__builtin_neon_splatq_lane_v)
   NumElements = NumElements * 2;
 if (BuiltinID == NEON::BI__builtin_neon_splat_laneq_v)
-  NumElements = NumElements / 2;
+  NumElements = NumElements.divideCoefficientBy(2);
 
 Ops[0] = Builder.CreateBitCast(Ops[0], VTy);
 return EmitNeonSplat(Ops[0], cast(Ops[1]), NumElements);
@@ -8483,8 +8483,7 @@ Value 
*CodeGenFunction::EmitAArch64SVEBuiltinExpr(unsigned BuiltinID,
   case SVE::BI__builtin_sve_svtbl2_f64: {
 SVETypeFlags TF(Builtin->TypeModifier);
 auto VTy = cast(getSVEType(TF));
-auto TupleTy = llvm::VectorType::get(VTy->getElementType(),
- VTy->getElementCount() * 2);
+auto TupleTy = llvm::VectorType::getDoubleElementsVectorType(VTy);
 Function *FExtr =
 CGM.getIntrinsic(Intrinsic::aarch64_sve_tuple_get, {VTy, TupleTy});
 Value *V0 = Builder.CreateCall(FExtr, {Ops[0], Builder.getInt32(0)});

diff  --git a/llvm/include/llvm/CodeGen/ValueTypes.h 
b/llvm/include/llvm/CodeGen/ValueTypes.h
index 164518faef22..b6f3fabd7f6a 100644
--- a/llvm/include/llvm/CodeGen/ValueTypes.h
+++ b/llvm/include/llvm/CodeGen/ValueTypes.h
@@ -414,7 +414,16 @@ namespace llvm {
   EVT EltVT = getVectorElementType();
   auto EltCnt = getVectorElementCount();
   assert(EltCnt.isKnownEven() && "Splitting vector, but not in half!");
-  return EVT::getVectorVT(Context, EltVT, EltCnt / 2);
+  return EVT::getVectorVT(Context, EltVT, EltCnt.divideCoefficientBy(2));
+}
+
+// Return a VT for a vector type with the same element type but
+// double the number of elements. The type returned may be an
+// extended type.
+EVT getDoubleNumVectorElementsVT(LLVMContext &Context) const {
+  EVT EltVT = getVectorElementType();
+  auto EltCnt = getVectorElementCount();
+  return EVT::getVectorVT(Context, EltVT, EltCnt * 2);
 }
 
 /// Returns true if the given vector is a power of 2.

diff  --git a/llvm/include/llvm/IR/DerivedTypes.h 
b/llvm/include/llvm/IR/DerivedTypes.h
index 619c699c2b97..7e9ea0e34c6b 100644
--- a/llvm/include/llvm/IR/DerivedTypes.h
+++ b/llvm/include/llvm/IR/DerivedTypes.h
@@ -504,7 +504,8 @@ class VectorType : public Type {
 auto EltCnt = VTy->getElementCount();
 assert(EltCnt.isKnownEven() &&
"Cannot halve vector with odd number of elements.");
-return VectorType::get(VTy->getElementType(), EltCnt/2);
+return VectorType::get(VTy->getElementType(),
+   EltCnt.divideCoefficientBy(2));
   }
 
   /// This static method returns a VectorType with twice as many elements as 
the

diff  --git a/llvm/include/llvm/Support/MachineValueType.h 
b/llvm/include/llvm/Support/MachineValueType.h
index b0d16ba3ef82..713f847535e8 100644
--- a/llvm/include/llvm/Support/MachineValueType.h
+++ b/llvm/include/llvm/Support/MachineValueType.h
@@ -425,7 +425,7 @@ namespace llvm {
   MVT

[clang] cea69fa - [SVE] Add fatal error for unnamed SVE variadic arguments

2020-10-30 Thread David Sherwood via cfe-commits

Author: David Sherwood
Date: 2020-10-30T13:35:47Z
New Revision: cea69fa4dcc4fcf3be62dba49ad012879d89377d

URL: 
https://github.com/llvm/llvm-project/commit/cea69fa4dcc4fcf3be62dba49ad012879d89377d
DIFF: 
https://github.com/llvm/llvm-project/commit/cea69fa4dcc4fcf3be62dba49ad012879d89377d.diff

LOG: [SVE] Add fatal error for unnamed SVE variadic arguments

We don't currently support passing unnamed variadic SVE arguments
so I've added a fatal error if we hit such cases to prevent any
silent ABI issues in future.

Differential Revision: https://reviews.llvm.org/D90230

Added: 
clang/test/CodeGen/aarch64-varargs-sve.c
llvm/test/CodeGen/AArch64/sve-varargs-callee-broken.ll
llvm/test/CodeGen/AArch64/sve-varargs-caller-broken.ll
llvm/test/CodeGen/AArch64/sve-varargs.ll

Modified: 
clang/lib/CodeGen/TargetInfo.cpp
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

Removed: 




diff  --git a/clang/lib/CodeGen/TargetInfo.cpp 
b/clang/lib/CodeGen/TargetInfo.cpp
index e211a0214eb4..63502ccf7a38 100644
--- a/clang/lib/CodeGen/TargetInfo.cpp
+++ b/clang/lib/CodeGen/TargetInfo.cpp
@@ -5480,6 +5480,11 @@ class AArch64ABIInfo : public SwiftABIInfo {
 
   Address EmitVAArg(CodeGenFunction &CGF, Address VAListAddr,
 QualType Ty) const override {
+llvm::Type *BaseTy = CGF.ConvertType(Ty);
+if (isa(BaseTy))
+  llvm::report_fatal_error("Passing SVE types to variadic functions is "
+   "currently not supported");
+
 return Kind == Win64 ? EmitMSVAArg(CGF, VAListAddr, Ty)
  : isDarwinPCS() ? EmitDarwinVAArg(VAListAddr, Ty, CGF)
  : EmitAAPCSVAArg(VAListAddr, Ty, CGF);

diff  --git a/clang/test/CodeGen/aarch64-varargs-sve.c 
b/clang/test/CodeGen/aarch64-varargs-sve.c
new file mode 100644
index ..bf57c6e1770a
--- /dev/null
+++ b/clang/test/CodeGen/aarch64-varargs-sve.c
@@ -0,0 +1,21 @@
+// REQUIRES: aarch64-registered-target
+// RUN: not %clang_cc1 -triple aarch64-linux-gnu -target-feature +sve 
-fallow-half-arguments-and-returns -emit-llvm -o - %s 2>&1 | FileCheck %s
+// RUN: not %clang_cc1 -triple arm64-apple-ios7 -target-abi darwinpcs 
-target-feature +sve -fallow-half-arguments-and-returns -emit-llvm -o - %s 2>&1 
| FileCheck %s
+
+// CHECK: Passing SVE types to variadic functions is currently not supported
+
+#include 
+#include 
+
+double foo(char *str, ...) {
+  va_list ap;
+  svfloat64_t v;
+  double x;
+
+  va_start(ap, str);
+  v = va_arg(ap, svfloat64_t);
+  x = va_arg(ap, double);
+  va_end(ap);
+
+  return x + svaddv(svptrue_b8(), v);
+}

diff  --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp 
b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 1579a28613a3..89713be01c55 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -4807,6 +4807,10 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
 
 for (unsigned i = 0; i != NumArgs; ++i) {
   MVT ArgVT = Outs[i].VT;
+  if (!Outs[i].IsFixed && ArgVT.isScalableVector())
+report_fatal_error("Passing SVE types to variadic functions is "
+   "currently not supported");
+
   ISD::ArgFlagsTy ArgFlags = Outs[i].Flags;
   CCAssignFn *AssignFn = CCAssignFnForCall(CallConv,
/*IsVarArg=*/ !Outs[i].IsFixed);
@@ -6606,6 +6610,10 @@ SDValue AArch64TargetLowering::LowerVAARG(SDValue Op, 
SelectionDAG &DAG) const {
   Chain = VAList.getValue(1);
   VAList = DAG.getZExtOrTrunc(VAList, DL, PtrVT);
 
+  if (VT.isScalableVector())
+report_fatal_error("Passing SVE types to variadic functions is "
+   "currently not supported");
+
   if (Align && *Align > MinSlotSize) {
 VAList = DAG.getNode(ISD::ADD, DL, PtrVT, VAList,
  DAG.getConstant(Align->value() - 1, DL, PtrVT));

diff  --git a/llvm/test/CodeGen/AArch64/sve-varargs-callee-broken.ll 
b/llvm/test/CodeGen/AArch64/sve-varargs-callee-broken.ll
new file mode 100644
index ..cd097d5cbb1d
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/sve-varargs-callee-broken.ll
@@ -0,0 +1,22 @@
+; RUN: not --crash llc -mtriple arm64-apple-ios7 -mattr=+sve < %s 2>&1 | 
FileCheck %s
+
+; CHECK: Passing SVE types to variadic functions is currently not supported
+
+@.str = private unnamed_addr constant [4 x i8] c"fmt\00", align 1
+define void @foo(i8* %fmt, ...) nounwind {
+entry:
+  %fmt.addr = alloca i8*, align 8
+  %args = alloca i8*, align 8
+  %vc = alloca i32, align 4
+  %vv = alloca , align 16
+  store i8* %fmt, i8** %fmt.addr, align 8
+  %args1 = bitcast i8** %args to i8*
+  call void @llvm.va_start(i8* %args1)
+  %0 = va_arg i8** %args, i32
+  store i32 %0, i32* %vc, align 4
+  %1 = va_arg i8** %args, 
+  store  %1, * %vv, align 16
+  ret void
+}
+
+declare void @llvm.va_s

[clang-tools-extra] [clang] [llvm] [LoopVectorize] Improve algorithm for hoisting runtime checks (PR #73515)

2023-12-11 Thread David Sherwood via cfe-commits

https://github.com/david-arm updated 
https://github.com/llvm/llvm-project/pull/73515

>From 30251642f8c208c63f3f3097c337ef0d5bc633b5 Mon Sep 17 00:00:00 2001
From: David Sherwood 
Date: Mon, 27 Nov 2023 13:43:26 +
Subject: [PATCH 1/5] [LoopVectorize] Improve algorithm for hoisting runtime
 checks

When attempting to hoist runtime checks out of a loop we currently
avoid creating pointer diff checks and prefer to do expanded range
checks instead. This gives us the opportunity to hoist runtime
checks out of a loop, since these checks are loop invariant. However,
in some cases the pointer diff checks would also be loop invariant
and so will naturally get hoisted. Therefore, since diff checks are
cheaper so we should prefer to use those instead.
---
 llvm/lib/Analysis/LoopAccessAnalysis.cpp  |   5 +-
 .../LoopVectorize/runtime-checks-hoist.ll | 143 ++
 2 files changed, 121 insertions(+), 27 deletions(-)

diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp 
b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index 3d1edd5f038a25..05765223397987 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -346,7 +346,10 @@ void RuntimePointerChecking::tryToCreateDiffCheck(
 auto *SinkStartAR = cast(SinkStartInt);
 const Loop *StartARLoop = SrcStartAR->getLoop();
 if (StartARLoop == SinkStartAR->getLoop() &&
-StartARLoop == InnerLoop->getParentLoop()) {
+StartARLoop == InnerLoop->getParentLoop() &&
+!SE->isKnownPredicate(ICmpInst::ICMP_EQ,
+  SrcStartAR->getStepRecurrence(*SE),
+  SinkStartAR->getStepRecurrence(*SE))) {
   LLVM_DEBUG(dbgs() << "LAA: Not creating diff runtime check, since these "
"cannot be hoisted out of the outer loop\n");
   CanUseDiffCheck = false;
diff --git a/llvm/test/Transforms/LoopVectorize/runtime-checks-hoist.ll 
b/llvm/test/Transforms/LoopVectorize/runtime-checks-hoist.ll
index 891597cbdc48a8..81702bf34e96be 100644
--- a/llvm/test/Transforms/LoopVectorize/runtime-checks-hoist.ll
+++ b/llvm/test/Transforms/LoopVectorize/runtime-checks-hoist.ll
@@ -69,11 +69,11 @@ define void @diff_checks(ptr nocapture noundef writeonly 
%dst, ptr nocapture nou
 ; CHECK-NEXT:[[TMP14:%.*]] = add nuw nsw i64 [[TMP13]], [[TMP10]]
 ; CHECK-NEXT:[[TMP15:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 
[[TMP14]]
 ; CHECK-NEXT:[[TMP16:%.*]] = getelementptr inbounds i32, ptr [[TMP15]], 
i32 0
-; CHECK-NEXT:[[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP16]], align 4, 
!alias.scope !0
+; CHECK-NEXT:[[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP16]], align 4, 
!alias.scope [[META0:![0-9]+]]
 ; CHECK-NEXT:[[TMP17:%.*]] = add nsw i64 [[TMP13]], [[TMP11]]
 ; CHECK-NEXT:[[TMP18:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 
[[TMP17]]
 ; CHECK-NEXT:[[TMP19:%.*]] = getelementptr inbounds i32, ptr [[TMP18]], 
i32 0
-; CHECK-NEXT:store <4 x i32> [[WIDE_LOAD]], ptr [[TMP19]], align 4, 
!alias.scope !3, !noalias !0
+; CHECK-NEXT:store <4 x i32> [[WIDE_LOAD]], ptr [[TMP19]], align 4, 
!alias.scope [[META3:![0-9]+]], !noalias [[META0]]
 ; CHECK-NEXT:[[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
 ; CHECK-NEXT:[[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
 ; CHECK-NEXT:br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label 
[[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
@@ -189,12 +189,12 @@ define void @full_checks(ptr nocapture noundef %dst, ptr 
nocapture noundef reado
 ; CHECK-NEXT:[[TMP5:%.*]] = add nuw nsw i64 [[TMP4]], [[TMP3]]
 ; CHECK-NEXT:[[TMP6:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 
[[TMP5]]
 ; CHECK-NEXT:[[TMP7:%.*]] = getelementptr inbounds i32, ptr [[TMP6]], i32 0
-; CHECK-NEXT:[[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP7]], align 4, 
!alias.scope !9
+; CHECK-NEXT:[[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP7]], align 4, 
!alias.scope [[META9:![0-9]+]]
 ; CHECK-NEXT:[[TMP8:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 
[[TMP5]]
 ; CHECK-NEXT:[[TMP9:%.*]] = getelementptr inbounds i32, ptr [[TMP8]], i32 0
-; CHECK-NEXT:[[WIDE_LOAD2:%.*]] = load <4 x i32>, ptr [[TMP9]], align 4, 
!alias.scope !12, !noalias !9
+; CHECK-NEXT:[[WIDE_LOAD2:%.*]] = load <4 x i32>, ptr [[TMP9]], align 4, 
!alias.scope [[META12:![0-9]+]], !noalias [[META9]]
 ; CHECK-NEXT:[[TMP10:%.*]] = add nsw <4 x i32> [[WIDE_LOAD2]], 
[[WIDE_LOAD]]
-; CHECK-NEXT:store <4 x i32> [[TMP10]], ptr [[TMP9]], align 4, 
!alias.scope !12, !noalias !9
+; CHECK-NEXT:store <4 x i32> [[TMP10]], ptr [[TMP9]], align 4, 
!alias.scope [[META12]], !noalias [[META9]]
 ; CHECK-NEXT:[[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
 ; CHECK-NEXT:[[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
 ; CHECK-NEXT:br i1 [[TMP11]], label [[MIDDLE_BLOCK:%.*]], label 
[[VECTOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]]
@@ -319,13 +319,13 @@ define void @full_ch

[clang-tools-extra] [clang] [llvm] [LoopVectorize] Improve algorithm for hoisting runtime checks (PR #73515)

2023-12-12 Thread David Sherwood via cfe-commits

https://github.com/david-arm closed 
https://github.com/llvm/llvm-project/pull/73515
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [llvm] [LoopVectorize] Enable hoisting of runtime checks by default (PR #71538)

2023-12-12 Thread David Sherwood via cfe-commits

https://github.com/david-arm updated 
https://github.com/llvm/llvm-project/pull/71538

>From 8a2af20a52fd851eaff1cfa7d50df8b994d0db0d Mon Sep 17 00:00:00 2001
From: David Sherwood 
Date: Tue, 7 Nov 2023 13:57:17 +
Subject: [PATCH 1/2] [LoopVectorize] Enable hoisting of runtime checks by
 default

With commit https://reviews.llvm.org/D152366 I introduced
functionality that permitted the hoisting of runtime memory checks
from a vectorised inner loop to the preheader of the next outer-most
loop. This is useful for benchmarks like SPEC2017's x264 where the
inner loop is vectorised and only has a small trip count. In such
cases the runtime memory checks become expensive and since the checks
never fail in the case of x264 it makes sense to do this. However,
this behaviour was controlled by the flag -hoist-runtime-checks
which was off by default.

This patch enables this flag by default for all targets, since I
believe this is a generally beneficial thing to do. I have tested
this with SPEC2017 and I see 2.3% and 2.6% improvements with x264 on
neoverse-v1 and neoverse-n1, respectively. Similarly, I saw slight
improvements in the overall geomean on both machines. The only
other notable changes were a 1% drop in the roms benchmark, which
was compensated for by a 1% improvement in fotonik3d.
---
 llvm/lib/Analysis/LoopAccessAnalysis.cpp  |  2 +-
 .../invariant-store-vectorization.ll  | 86 +-
 .../multiple-strides-vectorization.ll | 90 ---
 .../runtime-checks-difference.ll  |  2 +-
 .../LoopVectorize/runtime-checks-hoist.ll |  2 +-
 5 files changed, 124 insertions(+), 58 deletions(-)

diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp 
b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index 3d1edd5f038a25..05ca09968207fe 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -142,7 +142,7 @@ static cl::opt HoistRuntimeChecks(
 "hoist-runtime-checks", cl::Hidden,
 cl::desc(
 "Hoist inner loop runtime memory checks to outer loop if possible"),
-cl::location(VectorizerParams::HoistRuntimeChecks), cl::init(false));
+cl::location(VectorizerParams::HoistRuntimeChecks), cl::init(true));
 bool VectorizerParams::HoistRuntimeChecks;
 
 bool VectorizerParams::isInterleaveForced() {
diff --git 
a/llvm/test/Transforms/LoopVectorize/invariant-store-vectorization.ll 
b/llvm/test/Transforms/LoopVectorize/invariant-store-vectorization.ll
index 9e36649bcf73d6..52101fda6309f6 100644
--- a/llvm/test/Transforms/LoopVectorize/invariant-store-vectorization.ll
+++ b/llvm/test/Transforms/LoopVectorize/invariant-store-vectorization.ll
@@ -13,9 +13,6 @@ target datalayout = 
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f3
 ; address.
 
 
-; memory check is found.conflict = b[max(n-1,1)] > a && (ptr a)+1 > (ptr b)
-
-
 define i32 @inv_val_store_to_inv_address_with_reduction(ptr %a, i64 %n, ptr 
%b) {
 ; CHECK-LABEL: @inv_val_store_to_inv_address_with_reduction(
 ; CHECK-NEXT:  entry:
@@ -346,74 +343,75 @@ define i32 @multiple_uniform_stores(ptr nocapture %var1, 
ptr nocapture readonly
 ; CHECK-NEXT:[[CMP20:%.*]] = icmp eq i32 [[ITR:%.*]], 0
 ; CHECK-NEXT:br i1 [[CMP20]], label [[FOR_END10:%.*]], label 
[[FOR_COND1_PREHEADER_PREHEADER:%.*]]
 ; CHECK:   for.cond1.preheader.preheader:
-; CHECK-NEXT:[[SCEVGEP3:%.*]] = getelementptr i8, ptr [[VAR2:%.*]], i64 4
-; CHECK-NEXT:[[INVARIANT_GEP5:%.*]] = getelementptr i8, ptr [[VAR1:%.*]], 
i64 4
+; CHECK-NEXT:[[TMP0:%.*]] = add i32 [[ITR]], -1
+; CHECK-NEXT:[[TMP1:%.*]] = zext i32 [[TMP0]] to i64
+; CHECK-NEXT:[[TMP2:%.*]] = shl nuw nsw i64 [[TMP1]], 2
+; CHECK-NEXT:[[TMP3:%.*]] = getelementptr i8, ptr [[VAR1:%.*]], i64 
[[TMP2]]
+; CHECK-NEXT:[[SCEVGEP:%.*]] = getelementptr i8, ptr [[TMP3]], i64 4
+; CHECK-NEXT:[[SCEVGEP2:%.*]] = getelementptr i8, ptr [[VAR2:%.*]], i64 4
 ; CHECK-NEXT:br label [[FOR_COND1_PREHEADER:%.*]]
 ; CHECK:   for.cond1.preheader:
 ; CHECK-NEXT:[[INDVARS_IV23:%.*]] = phi i64 [ [[INDVARS_IV_NEXT24:%.*]], 
[[FOR_INC8:%.*]] ], [ 0, [[FOR_COND1_PREHEADER_PREHEADER]] ]
 ; CHECK-NEXT:[[J_022:%.*]] = phi i32 [ [[J_1_LCSSA:%.*]], [[FOR_INC8]] ], 
[ 0, [[FOR_COND1_PREHEADER_PREHEADER]] ]
-; CHECK-NEXT:[[TMP0:%.*]] = shl nuw nsw i64 [[INDVARS_IV23]], 2
-; CHECK-NEXT:[[SCEVGEP:%.*]] = getelementptr i8, ptr [[VAR1]], i64 [[TMP0]]
-; CHECK-NEXT:[[GEP6:%.*]] = getelementptr i8, ptr [[INVARIANT_GEP5]], i64 
[[TMP0]]
 ; CHECK-NEXT:[[CMP218:%.*]] = icmp ult i32 [[J_022]], [[ITR]]
 ; CHECK-NEXT:br i1 [[CMP218]], label [[FOR_BODY3_LR_PH:%.*]], label 
[[FOR_INC8]]
 ; CHECK:   for.body3.lr.ph:
 ; CHECK-NEXT:[[ARRAYIDX5:%.*]] = getelementptr inbounds i32, ptr [[VAR1]], 
i64 [[INDVARS_IV23]]
-; CHECK-NEXT:[[TMP1:%.*]] = zext i32 [[J_022]] to i64
+; CHECK-NEXT:[[TMP4:%.*]] = zext i32 [[J_022]] to i64
 ; CHECK-NEXT:[[ARRAYIDX5_PROMOTED:%.*]] = load i32, pt

[clang-tools-extra] [llvm] [clang] [AArch64] Add an AArch64 pass for loop idiom transformations (PR #72273)

2023-12-12 Thread David Sherwood via cfe-commits


@@ -0,0 +1,726 @@
+
+//===- AArch64LoopIdiomTransform.cpp - Loop idiom recognition 
-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "AArch64LoopIdiomTransform.h"
+#include "llvm/Analysis/DomTreeUpdater.h"
+#include "llvm/Analysis/LoopPass.h"
+#include "llvm/Analysis/TargetTransformInfo.h"
+#include "llvm/IR/Dominators.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/Intrinsics.h"
+#include "llvm/IR/MDBuilder.h"
+#include "llvm/IR/PatternMatch.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Transforms/Utils/BasicBlockUtils.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "aarch64-lit"

david-arm wrote:

Sorry, I just realised I have lost this change somehow. I'll fix it.

https://github.com/llvm/llvm-project/pull/72273
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-tools-extra] [llvm] [AArch64] Add an AArch64 pass for loop idiom transformations (PR #72273)

2023-12-12 Thread David Sherwood via cfe-commits


@@ -0,0 +1,726 @@
+
+//===- AArch64LoopIdiomTransform.cpp - Loop idiom recognition 
-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "AArch64LoopIdiomTransform.h"
+#include "llvm/Analysis/DomTreeUpdater.h"
+#include "llvm/Analysis/LoopPass.h"
+#include "llvm/Analysis/TargetTransformInfo.h"
+#include "llvm/IR/Dominators.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/Intrinsics.h"
+#include "llvm/IR/MDBuilder.h"
+#include "llvm/IR/PatternMatch.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Transforms/Utils/BasicBlockUtils.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "aarch64-lit"
+
+static cl::opt
+DisableAll("disable-aarch64-lit-all", cl::Hidden, cl::init(false),
+   cl::desc("Disable AArch64 Loop Idiom Transform Pass."));
+
+static cl::opt DisableByteCmp(
+"disable-aarch64-lit-bytecmp", cl::Hidden, cl::init(false),
+cl::desc("Proceed with AArch64 Loop Idiom Transform Pass, but do "
+ "not convert byte-compare loop(s)."));
+
+namespace llvm {
+
+void initializeAArch64LoopIdiomTransformLegacyPassPass(PassRegistry &);
+Pass *createAArch64LoopIdiomTransformPass();
+
+} // end namespace llvm
+
+namespace {
+
+class AArch64LoopIdiomTransform {
+  Loop *CurLoop = nullptr;
+  DominatorTree *DT;
+  LoopInfo *LI;
+  const TargetTransformInfo *TTI;
+  const DataLayout *DL;
+
+public:
+  explicit AArch64LoopIdiomTransform(DominatorTree *DT, LoopInfo *LI,
+ const TargetTransformInfo *TTI,
+ const DataLayout *DL)
+  : DT(DT), LI(LI), TTI(TTI), DL(DL) {}
+
+  bool run(Loop *L);
+
+private:
+  /// \name Countable Loop Idiom Handling
+  /// @{
+
+  bool runOnCountableLoop();
+  bool runOnLoopBlock(BasicBlock *BB, const SCEV *BECount,
+  SmallVectorImpl &ExitBlocks);
+
+  bool recognizeByteCompare();
+  Value *expandFindMismatch(IRBuilder<> &Builder, GetElementPtrInst *GEPA,
+GetElementPtrInst *GEPB, Value *Start,
+Value *MaxLen);
+  void transformByteCompare(GetElementPtrInst *GEPA, GetElementPtrInst *GEPB,
+Value *MaxLen, Value *Index, Value *Start,
+bool IncIdx, BasicBlock *FoundBB,
+BasicBlock *EndBB);
+  /// @}
+};
+
+class AArch64LoopIdiomTransformLegacyPass : public LoopPass {
+public:
+  static char ID;
+
+  explicit AArch64LoopIdiomTransformLegacyPass() : LoopPass(ID) {
+initializeAArch64LoopIdiomTransformLegacyPassPass(
+*PassRegistry::getPassRegistry());
+  }
+
+  StringRef getPassName() const override {
+return "Recognize AArch64-specific loop idioms";
+  }
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override {
+AU.addRequired();
+AU.addRequired();
+AU.addRequired();
+  }
+
+  bool runOnLoop(Loop *L, LPPassManager &LPM) override;
+};
+
+bool AArch64LoopIdiomTransformLegacyPass::runOnLoop(Loop *L,
+LPPassManager &LPM) {
+
+  if (skipLoop(L))
+return false;
+
+  auto *DT = &getAnalysis().getDomTree();
+  auto *LI = &getAnalysis().getLoopInfo();
+  auto &TTI = getAnalysis().getTTI(
+  *L->getHeader()->getParent());
+  return AArch64LoopIdiomTransform(
+ DT, LI, &TTI, &L->getHeader()->getModule()->getDataLayout())
+  .run(L);
+}
+
+} // end anonymous namespace
+
+char AArch64LoopIdiomTransformLegacyPass::ID = 0;
+
+INITIALIZE_PASS_BEGIN(
+AArch64LoopIdiomTransformLegacyPass, "aarch64-lit",
+"Transform specific loop idioms into optimised vector forms", false, false)
+INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass)
+INITIALIZE_PASS_DEPENDENCY(LoopSimplify)
+INITIALIZE_PASS_DEPENDENCY(LCSSAWrapperPass)
+INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
+INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
+INITIALIZE_PASS_END(
+AArch64LoopIdiomTransformLegacyPass, "aarch64-lit",
+"Transform specific loop idioms into optimised vector forms", false, false)
+
+Pass *llvm::createAArch64LoopIdiomTransformPass() {
+  return new AArch64LoopIdiomTransformLegacyPass();
+}
+
+PreservedAnalyses
+AArch64LoopIdiomTransformPass::run(Loop &L, LoopAnalysisManager &AM,
+   LoopStandardAnalysisResults &AR,
+   LPMUpdater &) {
+  if (DisableAll)
+return PreservedAnalyses::all();
+
+  const auto *DL = &L.getHeader()->getModule()->getDataLayout();
+
+  AArch64LoopIdiomTransform LIT(&AR.DT, &AR.LI, &AR.TTI, DL);
+  if (!LIT.run(&L))
+return PreservedAnalyses::all();
+
+  return PreservedAnalyses::none();
+}
+
+//===-

[llvm] [clang-tools-extra] [LoopVectorize] Enable hoisting of runtime checks by default (PR #71538)

2023-12-12 Thread David Sherwood via cfe-commits

david-arm wrote:

Gentle ping! https://github.com/llvm/llvm-project/pull/73515 has now landed so 
I think this patch should be ready to go.

https://github.com/llvm/llvm-project/pull/71538
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][SME2] Add multi-vector zip & unzip builtins (PR #74841)

2023-12-12 Thread David Sherwood via cfe-commits




david-arm wrote:

For builtins that operate purely on SVE vectors I think we've used the 
convention of adding _vector_ to the test name, i.e. see 
acle_sme2_vector_rshl.c, etc. Should we do the same here?

https://github.com/llvm/llvm-project/pull/74841
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [clang] [llvm] [AArch64] Add an AArch64 pass for loop idiom transformations (PR #72273)

2023-12-13 Thread David Sherwood via cfe-commits


@@ -0,0 +1,839 @@
+//===- AArch64LoopIdiomTransform.cpp - Loop idiom recognition 
-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This pass implements a pass that recognizes certain loop idioms and
+// transforms them into more optimized versions of the same loop. In cases
+// where this happens, it can be a significant performance win.
+//
+// We currently only recognize one loop that finds the first mismatched byte
+// in an array and returns the index, i.e. something like:
+//
+//  while (++i != n) {
+//if (a[i] != b[i])
+//  break;
+//  }
+//
+// In this example we can actually vectorize the loop despite the early exit,
+// although the loop vectorizer does not support it. It requires some extra
+// checks to deal with the possibility of faulting loads when crossing page
+// boundaries. However, even with these checks it is still profitable to do the
+// transformation.
+//
+//===--===//
+//
+// TODO List:
+//
+// * When optimizing for code size we may want to avoid some transformations.
+// * We can also support the inverse case where we scan for a matching element.
+//
+//===--===//
+
+#include "AArch64LoopIdiomTransform.h"
+#include "llvm/Analysis/DomTreeUpdater.h"
+#include "llvm/Analysis/LoopPass.h"
+#include "llvm/Analysis/TargetTransformInfo.h"
+#include "llvm/IR/Dominators.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/Intrinsics.h"
+#include "llvm/IR/MDBuilder.h"
+#include "llvm/IR/PatternMatch.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Transforms/Utils/BasicBlockUtils.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "aarch64-loop-idiom-transform"
+
+static cl::opt
+DisableAll("disable-aarch64-lit-all", cl::Hidden, cl::init(false),
+   cl::desc("Disable AArch64 Loop Idiom Transform Pass."));
+
+static cl::opt DisableByteCmp(
+"disable-aarch64-lit-bytecmp", cl::Hidden, cl::init(false),
+cl::desc("Proceed with AArch64 Loop Idiom Transform Pass, but do "
+ "not convert byte-compare loop(s)."));
+
+static cl::opt VerifyLoops(
+"aarch64-lit-verify", cl::Hidden, cl::init(false),
+cl::desc("Verify loops generated AArch64 Loop Idiom Transform Pass."));
+
+namespace llvm {
+
+void initializeAArch64LoopIdiomTransformLegacyPassPass(PassRegistry &);
+Pass *createAArch64LoopIdiomTransformPass();
+
+} // end namespace llvm
+
+namespace {
+
+class AArch64LoopIdiomTransform {
+  Loop *CurLoop = nullptr;
+  DominatorTree *DT;
+  LoopInfo *LI;
+  const TargetTransformInfo *TTI;
+  const DataLayout *DL;
+
+public:
+  explicit AArch64LoopIdiomTransform(DominatorTree *DT, LoopInfo *LI,
+ const TargetTransformInfo *TTI,
+ const DataLayout *DL)
+  : DT(DT), LI(LI), TTI(TTI), DL(DL) {}
+
+  bool run(Loop *L);
+
+private:
+  /// \name Countable Loop Idiom Handling
+  /// @{
+
+  bool runOnCountableLoop();
+  bool runOnLoopBlock(BasicBlock *BB, const SCEV *BECount,
+  SmallVectorImpl &ExitBlocks);
+
+  bool recognizeByteCompare();
+  Value *expandFindMismatch(IRBuilder<> &Builder, GetElementPtrInst *GEPA,
+GetElementPtrInst *GEPB, Value *Start,
+Value *MaxLen);
+  void transformByteCompare(GetElementPtrInst *GEPA, GetElementPtrInst *GEPB,
+Value *MaxLen, Value *Index, Value *Start,
+bool IncIdx, BasicBlock *FoundBB,
+BasicBlock *EndBB);
+  /// @}
+};
+
+class AArch64LoopIdiomTransformLegacyPass : public LoopPass {
+public:
+  static char ID;
+
+  explicit AArch64LoopIdiomTransformLegacyPass() : LoopPass(ID) {
+initializeAArch64LoopIdiomTransformLegacyPassPass(
+*PassRegistry::getPassRegistry());
+  }
+
+  StringRef getPassName() const override {
+return "Transform AArch64-specific loop idioms";
+  }
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override {
+AU.addRequired();
+AU.addRequired();
+AU.addRequired();
+  }
+
+  bool runOnLoop(Loop *L, LPPassManager &LPM) override;
+};
+
+bool AArch64LoopIdiomTransformLegacyPass::runOnLoop(Loop *L,
+LPPassManager &LPM) {
+
+  if (skipLoop(L))
+return false;
+
+  auto *DT = &getAnalysis().getDomTree();
+  auto *LI = &getAnalysis().getLoopInfo();
+  auto &TTI = getAnalysis().getTTI(
+  *L->getHeader()->getParent());
+  return AArch64LoopIdiomTransform(
+ DT, LI, &TTI, &L->getHeader()->getModule()->getDataLayout())
+  .run(L);
+}
+
+} // end anonymou

[clang-tools-extra] [clang] [llvm] [AArch64] Add an AArch64 pass for loop idiom transformations (PR #72273)

2023-12-13 Thread David Sherwood via cfe-commits


@@ -0,0 +1,839 @@
+//===- AArch64LoopIdiomTransform.cpp - Loop idiom recognition 
-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This pass implements a pass that recognizes certain loop idioms and
+// transforms them into more optimized versions of the same loop. In cases
+// where this happens, it can be a significant performance win.
+//
+// We currently only recognize one loop that finds the first mismatched byte
+// in an array and returns the index, i.e. something like:
+//
+//  while (++i != n) {
+//if (a[i] != b[i])
+//  break;
+//  }
+//
+// In this example we can actually vectorize the loop despite the early exit,
+// although the loop vectorizer does not support it. It requires some extra
+// checks to deal with the possibility of faulting loads when crossing page
+// boundaries. However, even with these checks it is still profitable to do the
+// transformation.
+//
+//===--===//
+//
+// TODO List:
+//
+// * When optimizing for code size we may want to avoid some transformations.
+// * We can also support the inverse case where we scan for a matching element.
+//
+//===--===//
+
+#include "AArch64LoopIdiomTransform.h"
+#include "llvm/Analysis/DomTreeUpdater.h"
+#include "llvm/Analysis/LoopPass.h"
+#include "llvm/Analysis/TargetTransformInfo.h"
+#include "llvm/IR/Dominators.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/Intrinsics.h"
+#include "llvm/IR/MDBuilder.h"
+#include "llvm/IR/PatternMatch.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Transforms/Utils/BasicBlockUtils.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "aarch64-loop-idiom-transform"
+
+static cl::opt
+DisableAll("disable-aarch64-lit-all", cl::Hidden, cl::init(false),
+   cl::desc("Disable AArch64 Loop Idiom Transform Pass."));
+
+static cl::opt DisableByteCmp(
+"disable-aarch64-lit-bytecmp", cl::Hidden, cl::init(false),
+cl::desc("Proceed with AArch64 Loop Idiom Transform Pass, but do "
+ "not convert byte-compare loop(s)."));
+
+static cl::opt VerifyLoops(
+"aarch64-lit-verify", cl::Hidden, cl::init(false),
+cl::desc("Verify loops generated AArch64 Loop Idiom Transform Pass."));
+
+namespace llvm {
+
+void initializeAArch64LoopIdiomTransformLegacyPassPass(PassRegistry &);
+Pass *createAArch64LoopIdiomTransformPass();
+
+} // end namespace llvm
+
+namespace {
+
+class AArch64LoopIdiomTransform {
+  Loop *CurLoop = nullptr;
+  DominatorTree *DT;
+  LoopInfo *LI;
+  const TargetTransformInfo *TTI;
+  const DataLayout *DL;
+
+public:
+  explicit AArch64LoopIdiomTransform(DominatorTree *DT, LoopInfo *LI,
+ const TargetTransformInfo *TTI,
+ const DataLayout *DL)
+  : DT(DT), LI(LI), TTI(TTI), DL(DL) {}
+
+  bool run(Loop *L);
+
+private:
+  /// \name Countable Loop Idiom Handling
+  /// @{
+
+  bool runOnCountableLoop();
+  bool runOnLoopBlock(BasicBlock *BB, const SCEV *BECount,
+  SmallVectorImpl &ExitBlocks);
+
+  bool recognizeByteCompare();
+  Value *expandFindMismatch(IRBuilder<> &Builder, GetElementPtrInst *GEPA,
+GetElementPtrInst *GEPB, Value *Start,
+Value *MaxLen);
+  void transformByteCompare(GetElementPtrInst *GEPA, GetElementPtrInst *GEPB,
+Value *MaxLen, Value *Index, Value *Start,
+bool IncIdx, BasicBlock *FoundBB,
+BasicBlock *EndBB);
+  /// @}
+};
+
+class AArch64LoopIdiomTransformLegacyPass : public LoopPass {
+public:
+  static char ID;
+
+  explicit AArch64LoopIdiomTransformLegacyPass() : LoopPass(ID) {
+initializeAArch64LoopIdiomTransformLegacyPassPass(
+*PassRegistry::getPassRegistry());
+  }
+
+  StringRef getPassName() const override {
+return "Transform AArch64-specific loop idioms";
+  }
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override {
+AU.addRequired();
+AU.addRequired();
+AU.addRequired();
+  }
+
+  bool runOnLoop(Loop *L, LPPassManager &LPM) override;
+};
+
+bool AArch64LoopIdiomTransformLegacyPass::runOnLoop(Loop *L,
+LPPassManager &LPM) {
+
+  if (skipLoop(L))
+return false;
+
+  auto *DT = &getAnalysis().getDomTree();
+  auto *LI = &getAnalysis().getLoopInfo();
+  auto &TTI = getAnalysis().getTTI(
+  *L->getHeader()->getParent());
+  return AArch64LoopIdiomTransform(
+ DT, LI, &TTI, &L->getHeader()->getModule()->getDataLayout())
+  .run(L);
+}
+
+} // end anonymou

[llvm] [clang] [clang-tools-extra] [AArch64] Add an AArch64 pass for loop idiom transformations (PR #72273)

2023-12-13 Thread David Sherwood via cfe-commits


@@ -0,0 +1,726 @@
+
+//===- AArch64LoopIdiomTransform.cpp - Loop idiom recognition 
-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "AArch64LoopIdiomTransform.h"
+#include "llvm/Analysis/DomTreeUpdater.h"
+#include "llvm/Analysis/LoopPass.h"
+#include "llvm/Analysis/TargetTransformInfo.h"
+#include "llvm/IR/Dominators.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/Intrinsics.h"
+#include "llvm/IR/MDBuilder.h"
+#include "llvm/IR/PatternMatch.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Transforms/Utils/BasicBlockUtils.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "aarch64-lit"
+
+static cl::opt
+DisableAll("disable-aarch64-lit-all", cl::Hidden, cl::init(false),
+   cl::desc("Disable AArch64 Loop Idiom Transform Pass."));
+
+static cl::opt DisableByteCmp(
+"disable-aarch64-lit-bytecmp", cl::Hidden, cl::init(false),
+cl::desc("Proceed with AArch64 Loop Idiom Transform Pass, but do "
+ "not convert byte-compare loop(s)."));
+
+namespace llvm {
+
+void initializeAArch64LoopIdiomTransformLegacyPassPass(PassRegistry &);
+Pass *createAArch64LoopIdiomTransformPass();
+
+} // end namespace llvm
+
+namespace {
+
+class AArch64LoopIdiomTransform {
+  Loop *CurLoop = nullptr;
+  DominatorTree *DT;
+  LoopInfo *LI;
+  const TargetTransformInfo *TTI;
+  const DataLayout *DL;
+
+public:
+  explicit AArch64LoopIdiomTransform(DominatorTree *DT, LoopInfo *LI,
+ const TargetTransformInfo *TTI,
+ const DataLayout *DL)
+  : DT(DT), LI(LI), TTI(TTI), DL(DL) {}
+
+  bool run(Loop *L);
+
+private:
+  /// \name Countable Loop Idiom Handling
+  /// @{
+
+  bool runOnCountableLoop();
+  bool runOnLoopBlock(BasicBlock *BB, const SCEV *BECount,
+  SmallVectorImpl &ExitBlocks);
+
+  bool recognizeByteCompare();
+  Value *expandFindMismatch(IRBuilder<> &Builder, GetElementPtrInst *GEPA,
+GetElementPtrInst *GEPB, Value *Start,
+Value *MaxLen);
+  void transformByteCompare(GetElementPtrInst *GEPA, GetElementPtrInst *GEPB,
+Value *MaxLen, Value *Index, Value *Start,
+bool IncIdx, BasicBlock *FoundBB,
+BasicBlock *EndBB);
+  /// @}
+};
+
+class AArch64LoopIdiomTransformLegacyPass : public LoopPass {
+public:
+  static char ID;
+
+  explicit AArch64LoopIdiomTransformLegacyPass() : LoopPass(ID) {
+initializeAArch64LoopIdiomTransformLegacyPassPass(
+*PassRegistry::getPassRegistry());
+  }
+
+  StringRef getPassName() const override {
+return "Recognize AArch64-specific loop idioms";
+  }
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override {
+AU.addRequired();
+AU.addRequired();
+AU.addRequired();
+  }
+
+  bool runOnLoop(Loop *L, LPPassManager &LPM) override;
+};
+
+bool AArch64LoopIdiomTransformLegacyPass::runOnLoop(Loop *L,
+LPPassManager &LPM) {
+
+  if (skipLoop(L))
+return false;
+
+  auto *DT = &getAnalysis().getDomTree();
+  auto *LI = &getAnalysis().getLoopInfo();
+  auto &TTI = getAnalysis().getTTI(
+  *L->getHeader()->getParent());
+  return AArch64LoopIdiomTransform(
+ DT, LI, &TTI, &L->getHeader()->getModule()->getDataLayout())
+  .run(L);
+}
+
+} // end anonymous namespace
+
+char AArch64LoopIdiomTransformLegacyPass::ID = 0;
+
+INITIALIZE_PASS_BEGIN(
+AArch64LoopIdiomTransformLegacyPass, "aarch64-lit",
+"Transform specific loop idioms into optimised vector forms", false, false)
+INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass)
+INITIALIZE_PASS_DEPENDENCY(LoopSimplify)
+INITIALIZE_PASS_DEPENDENCY(LCSSAWrapperPass)
+INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
+INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
+INITIALIZE_PASS_END(
+AArch64LoopIdiomTransformLegacyPass, "aarch64-lit",
+"Transform specific loop idioms into optimised vector forms", false, false)
+
+Pass *llvm::createAArch64LoopIdiomTransformPass() {
+  return new AArch64LoopIdiomTransformLegacyPass();
+}
+
+PreservedAnalyses
+AArch64LoopIdiomTransformPass::run(Loop &L, LoopAnalysisManager &AM,
+   LoopStandardAnalysisResults &AR,
+   LPMUpdater &) {
+  if (DisableAll)
+return PreservedAnalyses::all();
+
+  const auto *DL = &L.getHeader()->getModule()->getDataLayout();
+
+  AArch64LoopIdiomTransform LIT(&AR.DT, &AR.LI, &AR.TTI, DL);
+  if (!LIT.run(&L))
+return PreservedAnalyses::all();
+
+  return PreservedAnalyses::none();
+}
+
+//===-

[clang-tools-extra] [clang] [llvm] [AArch64] Add an AArch64 pass for loop idiom transformations (PR #72273)

2023-12-14 Thread David Sherwood via cfe-commits


@@ -0,0 +1,726 @@
+
+//===- AArch64LoopIdiomTransform.cpp - Loop idiom recognition 
-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "AArch64LoopIdiomTransform.h"
+#include "llvm/Analysis/DomTreeUpdater.h"
+#include "llvm/Analysis/LoopPass.h"
+#include "llvm/Analysis/TargetTransformInfo.h"
+#include "llvm/IR/Dominators.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/Intrinsics.h"
+#include "llvm/IR/MDBuilder.h"
+#include "llvm/IR/PatternMatch.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Transforms/Utils/BasicBlockUtils.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "aarch64-lit"
+
+static cl::opt
+DisableAll("disable-aarch64-lit-all", cl::Hidden, cl::init(false),
+   cl::desc("Disable AArch64 Loop Idiom Transform Pass."));
+
+static cl::opt DisableByteCmp(
+"disable-aarch64-lit-bytecmp", cl::Hidden, cl::init(false),
+cl::desc("Proceed with AArch64 Loop Idiom Transform Pass, but do "
+ "not convert byte-compare loop(s)."));
+
+namespace llvm {
+
+void initializeAArch64LoopIdiomTransformLegacyPassPass(PassRegistry &);
+Pass *createAArch64LoopIdiomTransformPass();
+
+} // end namespace llvm
+
+namespace {
+
+class AArch64LoopIdiomTransform {
+  Loop *CurLoop = nullptr;
+  DominatorTree *DT;
+  LoopInfo *LI;
+  const TargetTransformInfo *TTI;
+  const DataLayout *DL;
+
+public:
+  explicit AArch64LoopIdiomTransform(DominatorTree *DT, LoopInfo *LI,
+ const TargetTransformInfo *TTI,
+ const DataLayout *DL)
+  : DT(DT), LI(LI), TTI(TTI), DL(DL) {}
+
+  bool run(Loop *L);
+
+private:
+  /// \name Countable Loop Idiom Handling
+  /// @{
+
+  bool runOnCountableLoop();
+  bool runOnLoopBlock(BasicBlock *BB, const SCEV *BECount,
+  SmallVectorImpl &ExitBlocks);
+
+  bool recognizeByteCompare();
+  Value *expandFindMismatch(IRBuilder<> &Builder, GetElementPtrInst *GEPA,
+GetElementPtrInst *GEPB, Value *Start,
+Value *MaxLen);
+  void transformByteCompare(GetElementPtrInst *GEPA, GetElementPtrInst *GEPB,
+Value *MaxLen, Value *Index, Value *Start,
+bool IncIdx, BasicBlock *FoundBB,
+BasicBlock *EndBB);
+  /// @}
+};
+
+class AArch64LoopIdiomTransformLegacyPass : public LoopPass {
+public:
+  static char ID;
+
+  explicit AArch64LoopIdiomTransformLegacyPass() : LoopPass(ID) {
+initializeAArch64LoopIdiomTransformLegacyPassPass(
+*PassRegistry::getPassRegistry());
+  }
+
+  StringRef getPassName() const override {
+return "Recognize AArch64-specific loop idioms";
+  }
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override {
+AU.addRequired();
+AU.addRequired();
+AU.addRequired();
+  }
+
+  bool runOnLoop(Loop *L, LPPassManager &LPM) override;
+};
+
+bool AArch64LoopIdiomTransformLegacyPass::runOnLoop(Loop *L,
+LPPassManager &LPM) {
+
+  if (skipLoop(L))
+return false;
+
+  auto *DT = &getAnalysis().getDomTree();
+  auto *LI = &getAnalysis().getLoopInfo();
+  auto &TTI = getAnalysis().getTTI(
+  *L->getHeader()->getParent());
+  return AArch64LoopIdiomTransform(
+ DT, LI, &TTI, &L->getHeader()->getModule()->getDataLayout())
+  .run(L);
+}
+
+} // end anonymous namespace
+
+char AArch64LoopIdiomTransformLegacyPass::ID = 0;
+
+INITIALIZE_PASS_BEGIN(
+AArch64LoopIdiomTransformLegacyPass, "aarch64-lit",
+"Transform specific loop idioms into optimised vector forms", false, false)
+INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass)
+INITIALIZE_PASS_DEPENDENCY(LoopSimplify)
+INITIALIZE_PASS_DEPENDENCY(LCSSAWrapperPass)
+INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
+INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
+INITIALIZE_PASS_END(
+AArch64LoopIdiomTransformLegacyPass, "aarch64-lit",
+"Transform specific loop idioms into optimised vector forms", false, false)
+
+Pass *llvm::createAArch64LoopIdiomTransformPass() {
+  return new AArch64LoopIdiomTransformLegacyPass();
+}
+
+PreservedAnalyses
+AArch64LoopIdiomTransformPass::run(Loop &L, LoopAnalysisManager &AM,
+   LoopStandardAnalysisResults &AR,
+   LPMUpdater &) {
+  if (DisableAll)
+return PreservedAnalyses::all();
+
+  const auto *DL = &L.getHeader()->getModule()->getDataLayout();
+
+  AArch64LoopIdiomTransform LIT(&AR.DT, &AR.LI, &AR.TTI, DL);
+  if (!LIT.run(&L))
+return PreservedAnalyses::all();
+
+  return PreservedAnalyses::none();
+}
+
+//===-

[llvm] [clang] [Clang][SME2] Add builtins for moving multi-vectors to/from ZA (PR #71191)

2023-12-14 Thread David Sherwood via cfe-commits

https://github.com/david-arm approved this pull request.

LGTM! I had one minor comment, but I won't hold up the patch for it.

https://github.com/llvm/llvm-project/pull/71191
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang][SME2] Add builtins for moving multi-vectors to/from ZA (PR #71191)

2023-12-14 Thread David Sherwood via cfe-commits

https://github.com/david-arm edited 
https://github.com/llvm/llvm-project/pull/71191
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [Clang][SME2] Add builtins for moving multi-vectors to/from ZA (PR #71191)

2023-12-14 Thread David Sherwood via cfe-commits


@@ -299,6 +299,44 @@ multiclass ZAAddSub {
 defm SVADD : ZAAddSub<"add">;
 defm SVSUB : ZAAddSub<"sub">;
 
+// SME2 - MOVA
+
+//
+// Single, 2 and 4 vector-group read/write intrinsics.
+//
+
+multiclass ZAWrite_VG checks> {
+  def NAME # _VG2_H : Inst<"svwrite_hor_" # n # "_vg2",   "vim2", t, 
MergeNone, i # "_hor_vg2", [IsSharedZA, IsStreaming], checks>;
+  def NAME # _VG2_V : Inst<"svwrite_ver_" # n # "_vg2",   "vim2", t, 
MergeNone, i # "_ver_vg2", [IsSharedZA, IsStreaming], checks>;
+  def NAME # _VG4_H : Inst<"svwrite_hor_" # n # "_vg4",   "vim4", t, 
MergeNone, i # "_hor_vg4", [IsSharedZA, IsStreaming], checks>;
+  def NAME # _VG4_V : Inst<"svwrite_ver_" # n # "_vg4",   "vim4", t, 
MergeNone, i # "_ver_vg4", [IsSharedZA, IsStreaming], checks>;
+  def NAME # _VG1x2 : Inst<"svwrite_" # n # "_vg1x2", "vm2",  t, 
MergeNone, i # "_vg1x2",   [IsSharedZA, IsStreaming], []>;
+  def NAME # _VG1x4 : Inst<"svwrite_" # n # "_vg1x4", "vm4",  t, 
MergeNone, i # "_vg1x4",   [IsSharedZA, IsStreaming], []>;
+}
+
+let TargetGuard = "sme2" in {
+  defm SVWRITE_ZA8  : ZAWrite_VG<"za8[_{d}]",  "cUc",   "aarch64_sme_write", 
[ImmCheck<0, ImmCheck0_0>]>;

david-arm wrote:

This is just a thought - is it worth pushing the `"[_{d}]"` bit into the 
multiclass given it's the same for each size, i.e.

```
  def NAME # _VG2_H : Inst<"svwrite_hor_" # n # "[_{d}]_vg2",   "vim2", t, 
MergeNone, i # "_hor_vg2", [IsSharedZA, IsStreaming], checks>;
```

and same question for the reads.

https://github.com/llvm/llvm-project/pull/71191
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang-tools-extra] [LoopVectorize] Enable hoisting of runtime checks by default (PR #71538)

2023-12-18 Thread David Sherwood via cfe-commits

https://github.com/david-arm closed 
https://github.com/llvm/llvm-project/pull/71538
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] ceb6c23 - [NFC][LoopVectorize] Explicitly disable tail-folding on some SVE tests

2022-07-21 Thread David Sherwood via cfe-commits

Author: David Sherwood
Date: 2022-07-21T15:23:00+01:00
New Revision: ceb6c23b708d4cae3fbb0a569c5ac14069524a63

URL: 
https://github.com/llvm/llvm-project/commit/ceb6c23b708d4cae3fbb0a569c5ac14069524a63
DIFF: 
https://github.com/llvm/llvm-project/commit/ceb6c23b708d4cae3fbb0a569c5ac14069524a63.diff

LOG: [NFC][LoopVectorize] Explicitly disable tail-folding on some SVE tests

This patch is in preparation for enabling vectorisation with tail-folding
by default for SVE targets. Once we do that many existing tests will
break that depend upon having normal unpredicated vector loops. For
all such tests I have added the flag:

  -prefer-predicate-over-epilogue=scalar-epilogue

Differential Revision: https://reviews.llvm.org/D129137

Added: 


Modified: 
clang/test/CodeGen/aarch64-sve-vector-bits-codegen.c

llvm/test/Transforms/LoopVectorize/AArch64/gather-do-not-vectorize-addressing.ll
llvm/test/Transforms/LoopVectorize/AArch64/i1-reg-usage.ll
llvm/test/Transforms/LoopVectorize/AArch64/scalable-call.ll
llvm/test/Transforms/LoopVectorize/AArch64/scalable-reduction-inloop-cond.ll
llvm/test/Transforms/LoopVectorize/AArch64/scalable-reductions.ll
llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll

llvm/test/Transforms/LoopVectorize/AArch64/scalarize-store-with-predication.ll
llvm/test/Transforms/LoopVectorize/AArch64/sve-basic-vec.ll
llvm/test/Transforms/LoopVectorize/AArch64/sve-cond-inv-loads.ll
llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect.ll
llvm/test/Transforms/LoopVectorize/AArch64/sve-fneg.ll
llvm/test/Transforms/LoopVectorize/AArch64/sve-gather-scatter-cost.ll
llvm/test/Transforms/LoopVectorize/AArch64/sve-gather-scatter.ll
llvm/test/Transforms/LoopVectorize/AArch64/sve-illegal-type.ll
llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll
llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions.ll
llvm/test/Transforms/LoopVectorize/AArch64/sve-inv-loads.ll
llvm/test/Transforms/LoopVectorize/AArch64/sve-inv-store.ll
llvm/test/Transforms/LoopVectorize/AArch64/sve-large-strides.ll
llvm/test/Transforms/LoopVectorize/AArch64/sve-masked-loadstore.ll

llvm/test/Transforms/LoopVectorize/AArch64/sve-runtime-check-size-based-threshold.ll
llvm/test/Transforms/LoopVectorize/AArch64/sve-select-cmp.ll
llvm/test/Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll
llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse-mask4.ll
llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse.ll
llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-gep.ll
llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-phi.ll
llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll
llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse.ll

Removed: 




diff  --git a/clang/test/CodeGen/aarch64-sve-vector-bits-codegen.c 
b/clang/test/CodeGen/aarch64-sve-vector-bits-codegen.c
index bccd328f0ccad..e306f44c27fb3 100644
--- a/clang/test/CodeGen/aarch64-sve-vector-bits-codegen.c
+++ b/clang/test/CodeGen/aarch64-sve-vector-bits-codegen.c
@@ -1,7 +1,11 @@
-// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve 
-fallow-half-arguments-and-returns -O2 -S -o - %s -mvscale-min=2 -mvscale-max=2 
 | FileCheck %s --check-prefixes=CHECK,CHECK256
-// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve 
-fallow-half-arguments-and-returns -O2 -S -o - %s -mvscale-min=4 -mvscale-max=4 
 | FileCheck %s --check-prefixes=CHECK,CHECK512
-// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve 
-fallow-half-arguments-and-returns -O2 -S -o - %s -mvscale-min=8 -mvscale-max=8 
| FileCheck %s --check-prefixes=CHECK,CHECK1024
-// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve 
-fallow-half-arguments-and-returns -O2 -S -o - %s -mvscale-min=16 
-mvscale-max=16 | FileCheck %s --check-prefixes=CHECK,CHECK2048
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve 
-fallow-half-arguments-and-returns -O2 -S \
+// RUN:   -mllvm -prefer-predicate-over-epilogue=scalar-epilogue -o - %s 
-mvscale-min=2 -mvscale-max=2  | FileCheck %s --check-prefixes=CHECK,CHECK256
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve 
-fallow-half-arguments-and-returns -O2 -S \
+// RUN:   -mllvm -prefer-predicate-over-epilogue=scalar-epilogue -o - %s 
-mvscale-min=4 -mvscale-max=4  | FileCheck %s --check-prefixes=CHECK,CHECK512
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve 
-fallow-half-arguments-and-returns -O2 -S \
+// RUN:   -mllvm -prefer-predicate-over-epilogue=scalar-epilogue -o - %s 
-mvscale-min=8 -mvscale-max=8 | FileCheck %s --check-prefixes=CHECK,CHECK1024
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve 
-fallow-half-arguments-and-returns -O2 -S \
+// RUN:   -m

[clang] fbb1194 - [AArch64] Add Neoverse V2 CPU support

2022-09-27 Thread David Sherwood via cfe-commits

Author: David Sherwood
Date: 2022-09-27T07:56:08Z
New Revision: fbb119412f143530a23d22b6b0f90d4cf2303fbf

URL: 
https://github.com/llvm/llvm-project/commit/fbb119412f143530a23d22b6b0f90d4cf2303fbf
DIFF: 
https://github.com/llvm/llvm-project/commit/fbb119412f143530a23d22b6b0f90d4cf2303fbf.diff

LOG: [AArch64] Add Neoverse V2 CPU support

Adds support for the Neoverse V2 CPU to the AArch64 backend.

Differential Revision: https://reviews.llvm.org/D134352

Added: 


Modified: 
clang/docs/ReleaseNotes.rst
clang/test/Driver/aarch64-mcpu.c
clang/test/Misc/target-invalid-cpu-note.c
llvm/include/llvm/Support/AArch64TargetParser.def
llvm/lib/Support/Host.cpp
llvm/lib/Target/AArch64/AArch64.td
llvm/lib/Target/AArch64/AArch64Subtarget.cpp
llvm/lib/Target/AArch64/AArch64Subtarget.h
llvm/test/CodeGen/AArch64/cpus.ll
llvm/unittests/Support/TargetParserTest.cpp

Removed: 




diff  --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 5fe86f8bdecd..0dd3b54aa475 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -404,6 +404,8 @@ Arm and AArch64 Support in Clang
 - ``-march`` values for targeting armv2, armv2A, armv3 and armv3M have been 
removed.
   Their presence gave the impression that Clang can correctly generate code for
   them, which it cannot.
+- Add driver and tuning support for Neoverse V2 via the flag 
``-mcpu=neoverse-v2``.
+  Native detection is also supported via ``-mcpu=native``.
 
 Floating Point Support in Clang
 ---

diff  --git a/clang/test/Driver/aarch64-mcpu.c 
b/clang/test/Driver/aarch64-mcpu.c
index d3c30b31f4fc..8b2701f27a9e 100644
--- a/clang/test/Driver/aarch64-mcpu.c
+++ b/clang/test/Driver/aarch64-mcpu.c
@@ -49,6 +49,8 @@
 // NEOVERSE-E1: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-cpu" 
"neoverse-e1"
 // RUN: %clang -target aarch64 -mcpu=neoverse-v1  -### -c %s 2>&1 | FileCheck 
-check-prefix=NEOVERSE-V1 %s
 // NEOVERSE-V1: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-cpu" 
"neoverse-v1"
+// RUN: %clang -target aarch64 -mcpu=neoverse-v2  -### -c %s 2>&1 | FileCheck 
-check-prefix=NEOVERSE-V2 %s
+// NEOVERSE-V2: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-cpu" 
"neoverse-v2"
 // RUN: %clang -target aarch64 -mcpu=neoverse-n1 -### -c %s 2>&1 | FileCheck 
-check-prefix=NEOVERSE-N1 %s
 // NEOVERSE-N1: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-cpu" 
"neoverse-n1"
 // RUN: %clang -target aarch64 -mcpu=neoverse-n2 -### -c %s 2>&1 | FileCheck 
-check-prefix=NEOVERSE-N2 %s

diff  --git a/clang/test/Misc/target-invalid-cpu-note.c 
b/clang/test/Misc/target-invalid-cpu-note.c
index 1f5899fa2649..0fa4d3164111 100644
--- a/clang/test/Misc/target-invalid-cpu-note.c
+++ b/clang/test/Misc/target-invalid-cpu-note.c
@@ -5,11 +5,11 @@
 
 // RUN: not %clang_cc1 -triple arm64--- -target-cpu not-a-cpu -fsyntax-only %s 
2>&1 | FileCheck %s --check-prefix AARCH64
 // AARCH64: error: unknown target CPU 'not-a-cpu'
-// AARCH64-NEXT: note: valid target CPU values are: cortex-a34, cortex-a35, 
cortex-a53, cortex-a55, cortex-a510, cortex-a57, cortex-a65, cortex-a65ae, 
cortex-a72, cortex-a73, cortex-a75, cortex-a76, cortex-a76ae, cortex-a77, 
cortex-a78, cortex-a78c, cortex-a710, cortex-r82, cortex-x1, cortex-x1c, 
cortex-x2, neoverse-e1, neoverse-n1, neoverse-n2, neoverse-512tvb, neoverse-v1, 
cyclone, apple-a7, apple-a8, apple-a9, apple-a10, apple-a11, apple-a12, 
apple-a13, apple-a14, apple-a15, apple-a16, apple-m1, apple-m2, apple-s4, 
apple-s5, exynos-m3, exynos-m4, exynos-m5, falkor, saphira, kryo, thunderx2t99, 
thunderx3t110, thunderx, thunderxt88, thunderxt81, thunderxt83, tsv110, a64fx, 
carmel, ampere1{{$}}
+// AARCH64-NEXT: note: valid target CPU values are: cortex-a34, cortex-a35, 
cortex-a53, cortex-a55, cortex-a510, cortex-a57, cortex-a65, cortex-a65ae, 
cortex-a72, cortex-a73, cortex-a75, cortex-a76, cortex-a76ae, cortex-a77, 
cortex-a78, cortex-a78c, cortex-a710, cortex-r82, cortex-x1, cortex-x1c, 
cortex-x2, neoverse-e1, neoverse-n1, neoverse-n2, neoverse-512tvb, neoverse-v1, 
neoverse-v2, cyclone, apple-a7, apple-a8, apple-a9, apple-a10, apple-a11, 
apple-a12, apple-a13, apple-a14, apple-a15, apple-a16, apple-m1, apple-m2, 
apple-s4, apple-s5, exynos-m3, exynos-m4, exynos-m5, falkor, saphira, kryo, 
thunderx2t99, thunderx3t110, thunderx, thunderxt88, thunderxt81, thunderxt83, 
tsv110, a64fx, carmel, ampere1{{$}}
 
 // RUN: not %clang_cc1 -triple arm64--- -tune-cpu not-a-cpu -fsyntax-only %s 
2>&1 | FileCheck %s --check-prefix TUNE_AARCH64
 // TUNE_AARCH64: error: unknown target CPU 'not-a-cpu'
-// TUNE_AARCH64-NEXT: note: valid target CPU values are: cortex-a34, 
cortex-a35, cortex-a53, cortex-a55, cortex-a510, cortex-a57, cortex-a65, 
cortex-a65ae, cortex-a72, cortex-a73, cortex-a75, cortex-a76, cortex-a76ae, 
cortex-a77, cortex-a78, cortex-a78c, cortex-a710, cortex-r82, cortex-x1, 
cortex

[clang] [clang] Add nuw attribute to GEPs (PR #105496)

2024-08-28 Thread David Sherwood via cfe-commits

david-arm wrote:

> This patch breaks: https://lab.llvm.org/buildbot/#/builders/25/builds/1952 
> https://lab.llvm.org/buildbot/#/builders/52/builds/1775

>From the buildbot run I can see 12 or 13 changes in the build that failed. 
>Just out of curiosity how did you find out it was this patch that caused it?

https://github.com/llvm/llvm-project/pull/105496
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Add nuw attribute to GEPs (PR #105496)

2024-08-28 Thread David Sherwood via cfe-commits

david-arm wrote:

> > > This patch breaks: 
> > > https://lab.llvm.org/buildbot/#/builders/25/builds/1952 
> > > https://lab.llvm.org/buildbot/#/builders/52/builds/1775
> > 
> > 
> > From the buildbot run I can see 12 or 13 changes in the build that failed. 
> > Just out of curiosity how did you find out it was this patch that caused it?
> 
> Bisected and reverted locally.

Thanks! I was just wondering if it was something obvious from the patch or 
whether there was a bisecting tool to use in github. But bisecting and 
reverting locally is the same thing I'd do.

https://github.com/llvm/llvm-project/pull/105496
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [llvm] [clang] [LoopVectorize] Refine runtime memory check costs when there is an outer loop (PR #76034)

2024-01-18 Thread David Sherwood via cfe-commits

https://github.com/david-arm updated 
https://github.com/llvm/llvm-project/pull/76034

>From a4caa47dc8d2db75f6bb2ac3f880da4e1f6bea82 Mon Sep 17 00:00:00 2001
From: David Sherwood 
Date: Tue, 19 Dec 2023 16:07:33 +
Subject: [PATCH 1/6] Add tests showing runtime checks cost with low trip
 counts

---
 .../AArch64/low_trip_memcheck_cost.ll | 187 ++
 1 file changed, 187 insertions(+)
 create mode 100644 
llvm/test/Transforms/LoopVectorize/AArch64/low_trip_memcheck_cost.ll

diff --git 
a/llvm/test/Transforms/LoopVectorize/AArch64/low_trip_memcheck_cost.ll 
b/llvm/test/Transforms/LoopVectorize/AArch64/low_trip_memcheck_cost.ll
new file mode 100644
index 00..397521c2d3dc8f
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/low_trip_memcheck_cost.ll
@@ -0,0 +1,187 @@
+; REQUIRES: asserts
+; RUN: opt -p loop-vectorize -debug-only=loop-vectorize -S -disable-output < 
%s 2>&1 | FileCheck %s
+
+target triple = "aarch64-unknown-linux-gnu"
+
+
+define void @outer_no_tc(ptr nocapture noundef %a, ptr nocapture noundef 
readonly %b, i64 noundef %m, i64 noundef %n) {
+; CHECK-LABEL: LV: Checking a loop in 'outer_no_tc'
+; CHECK:  Calculating cost of runtime checks:
+; CHECK:  Total cost of runtime checks: 6
+; CHECK-NEXT: LV: Minimum required TC for runtime checks to be profitable:16
+entry:
+  br label %outer.loop
+
+outer.loop:
+  %outer.iv = phi i64 [ %outer.iv.next, %inner.exit ], [ 0, %entry ]
+  %mul.us = mul nsw i64 %outer.iv, %n
+  br label %inner.loop
+
+inner.loop:
+  %inner.iv = phi i64 [ 0, %outer.loop ], [ %inner.iv.next, %inner.loop ]
+  %add.us = add nuw nsw i64 %inner.iv, %mul.us
+  %arrayidx.us = getelementptr inbounds i8, ptr %b, i64 %add.us
+  %0 = load i8, ptr %arrayidx.us, align 1
+  %arrayidx7.us = getelementptr inbounds i8, ptr %a, i64 %add.us
+  %1 = load i8, ptr %arrayidx7.us, align 1
+  %add9.us = add i8 %1, %0
+  store i8 %add9.us, ptr %arrayidx7.us, align 1
+  %inner.iv.next = add nuw nsw i64 %inner.iv, 1
+  %exitcond.not = icmp eq i64 %inner.iv.next, %n
+  br i1 %exitcond.not, label %inner.exit, label %inner.loop
+
+inner.exit:
+  %outer.iv.next = add nuw nsw i64 %outer.iv, 1
+  %exitcond27.not = icmp eq i64 %outer.iv.next, %m
+  br i1 %exitcond27.not, label %outer.exit, label %outer.loop
+
+outer.exit:
+  ret void
+}
+
+
+define void @outer_known_tc3(ptr nocapture noundef %a, ptr nocapture noundef 
readonly %b, i64 noundef %n) {
+; CHECK-LABEL: LV: Checking a loop in 'outer_known_tc3'
+; CHECK:  Calculating cost of runtime checks:
+; CHECK:  Total cost of runtime checks: 6
+; CHECK-NEXT: LV: Minimum required TC for runtime checks to be profitable:16
+entry:
+  br label %outer.loop
+
+outer.loop:
+  %outer.iv = phi i64 [ %outer.iv.next, %inner.exit ], [ 0, %entry ]
+  %mul.us = mul nsw i64 %outer.iv, %n
+  br label %inner.loop
+
+inner.loop:
+  %inner.iv = phi i64 [ 0, %outer.loop ], [ %inner.iv.next, %inner.loop ]
+  %add.us = add nuw nsw i64 %inner.iv, %mul.us
+  %arrayidx.us = getelementptr inbounds i8, ptr %b, i64 %add.us
+  %0 = load i8, ptr %arrayidx.us, align 1
+  %arrayidx7.us = getelementptr inbounds i8, ptr %a, i64 %add.us
+  %1 = load i8, ptr %arrayidx7.us, align 1
+  %add9.us = add i8 %1, %0
+  store i8 %add9.us, ptr %arrayidx7.us, align 1
+  %inner.iv.next = add nuw nsw i64 %inner.iv, 1
+  %exitcond.not = icmp eq i64 %inner.iv.next, %n
+  br i1 %exitcond.not, label %inner.exit, label %inner.loop
+
+inner.exit:
+  %outer.iv.next = add nuw nsw i64 %outer.iv, 1
+  %exitcond26.not = icmp eq i64 %outer.iv.next, 3
+  br i1 %exitcond26.not, label %outer.exit, label %outer.loop
+
+outer.exit:
+  ret void
+}
+
+
+define void @outer_known_tc64(ptr nocapture noundef %a, ptr nocapture noundef 
readonly %b, i64 noundef %n) {
+; CHECK-LABEL: LV: Checking a loop in 'outer_known_tc64'
+; CHECK:  Calculating cost of runtime checks:
+; CHECK:  Total cost of runtime checks: 6
+; CHECK-NEXT: LV: Minimum required TC for runtime checks to be profitable:16
+entry:
+  br label %outer.loop
+
+outer.loop:
+  %outer.iv = phi i64 [ %outer.iv.next, %inner.exit ], [ 0, %entry ]
+  %mul.us = mul nsw i64 %outer.iv, %n
+  br label %inner.loop
+
+inner.loop:
+  %inner.iv = phi i64 [ 0, %outer.loop ], [ %inner.iv.next, %inner.loop ]
+  %add.us = add nuw nsw i64 %inner.iv, %mul.us
+  %arrayidx.us = getelementptr inbounds i8, ptr %b, i64 %add.us
+  %0 = load i8, ptr %arrayidx.us, align 1
+  %arrayidx7.us = getelementptr inbounds i8, ptr %a, i64 %add.us
+  %1 = load i8, ptr %arrayidx7.us, align 1
+  %add9.us = add i8 %1, %0
+  store i8 %add9.us, ptr %arrayidx7.us, align 1
+  %inner.iv.next = add nuw nsw i64 %inner.iv, 1
+  %exitcond.not = icmp eq i64 %inner.iv.next, %n
+  br i1 %exitcond.not, label %inner.exit, label %inner.loop
+
+inner.exit:
+  %outer.iv.next = add nuw nsw i64 %outer.iv, 1
+  %exitcond26.not = icmp eq i64 %outer.iv.next, 64
+  br i1 %exitcond26.not, label %outer.exit, label %outer.loop
+
+outer.exit:
+  ret void
+}
+
+
+defi

[llvm] [clang-tools-extra] [clang] [LoopVectorize] Refine runtime memory check costs when there is an outer loop (PR #76034)

2024-01-18 Thread David Sherwood via cfe-commits


@@ -2076,16 +2081,61 @@ class GeneratedRTChecks {
 LLVM_DEBUG(dbgs() << "  " << C << "  for " << I << "\n");
 RTCheckCost += C;
   }
-if (MemCheckBlock)
+if (MemCheckBlock) {
+  InstructionCost MemCheckCost = 0;
   for (Instruction &I : *MemCheckBlock) {
 if (MemCheckBlock->getTerminator() == &I)
   continue;
 InstructionCost C =
 TTI->getInstructionCost(&I, TTI::TCK_RecipThroughput);
 LLVM_DEBUG(dbgs() << "  " << C << "  for " << I << "\n");
-RTCheckCost += C;
+MemCheckCost += C;
   }
 
+  // If the runtime memory checks are being created inside an outer loop
+  // we should find out if these checks are outer loop invariant. If so,
+  // the checks will likely be hoisted out and so the effective cost will
+  // reduce according to the outer loop trip count.
+  if (OuterLoop) {
+ScalarEvolution *SE = MemCheckExp.getSE();
+// TODO: We could refine this further by analysing every individual
+// memory check, since there could be a mixture of loop variant and
+// invariant checks that mean the final condition is variant. However,
+// I think it would need further analysis to prove this is beneficial.
+const SCEV *Cond = SE->getSCEV(MemRuntimeCheckCond);
+if (SE->isLoopInvariant(Cond, OuterLoop)) {
+  // It seems reasonable to assume that we can reduce the effective
+  // cost of the checks even when we know nothing about the trip
+  // count. Here I've assumed that the outer loop executes at least
+  // twice.
+  unsigned BestTripCount = 2;
+
+  // If exact trip count is known use that.
+  if (unsigned SmallTC = SE->getSmallConstantTripCount(OuterLoop))
+BestTripCount = SmallTC;
+  else if (LoopVectorizeWithBlockFrequency) {
+// Else use profile data if available.
+if (auto EstimatedTC = getLoopEstimatedTripCount(OuterLoop))
+  BestTripCount = *EstimatedTC;
+  }
+
+  InstructionCost NewMemCheckCost = MemCheckCost / BestTripCount;
+
+  // Let's ensure the cost is always at least 1.
+  NewMemCheckCost = std::max(*NewMemCheckCost.getValue(), (long)1);

david-arm wrote:

Good spot! I hope I've fixed it now. :)

https://github.com/llvm/llvm-project/pull/76034
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [LTO] Fix Veclib flags correctly pass to LTO flags (PR #78749)

2024-01-22 Thread David Sherwood via cfe-commits


@@ -31,3 +31,31 @@
 
 // RUN: %clang -fveclib=Accelerate %s -nodefaultlibs -target 
arm64-apple-ios8.0.0 -### 2>&1 | FileCheck 
--check-prefix=CHECK-LINK-NODEFAULTLIBS %s
 // CHECK-LINK-NODEFAULTLIBS-NOT: "-framework" "Accelerate"
+
+
+/* Verify that the correct vector library is passed to LTO flags. */
+
+
+// RUN: %clang -### -fveclib=none -flto %s -v 2>&1  | FileCheck -check-prefix 
CHECK-LTO-NOLIB %s
+// CHECK-LTO-NOLIB: "-plugin-opt=-vector-library=none"
+
+// RUN: %clang -### -fveclib=Accelerate -flto %s -v 2>&1  | FileCheck 
-check-prefix CHECK-LTO-ACCELERATE %s
+// CHECK-LTO-ACCELERATE: "-plugin-opt=-vector-library=Accelerate"
+
+// RUN: %clang -### -fveclib=LIBMVEC -flto %s -v 2>&1  | FileCheck 
-check-prefix CHECK-LTO-LIBMVEC %s
+// CHECK-LTO-LIBMVEC: "-plugin-opt=-vector-library=LIBMVEC-X86"
+
+// RUN: %clang -### -fveclib=MASSV -flto %s -v 2>&1  | FileCheck -check-prefix 
CHECK-LTO-MASSV %s
+// CHECK-LTO-MASSV: "-plugin-opt=-vector-library=MASSV"
+
+// RUN: not %clang -### -fveclib=SVML -flto %s -v 2>&1  | FileCheck 
-check-prefix CHECK-LTO-SVML %s
+// CHECK-LTO-SVML: "-plugin-opt=-vector-library=SVML"
+
+// RUN: %clang -### -fveclib=SLEEF -flto %s -v 2>&1  | FileCheck -check-prefix 
CHECK-LTO-SLEEF %s
+// CHECK-LTO-SLEEF: "-plugin-opt=-vector-library=sleefgnuabi"
+
+// RUN: %clang -### -fveclib=Darwin_libsystem_m -flto %s -v 2>&1  | FileCheck 
-check-prefix CHECK-LTO-DARWIN %s
+// CHECK-LTO-DARWIN: "-plugin-opt=-vector-library=Darwin_libsystem_m"
+
+// RUN: %clang -### -fveclib=ArmPL -flto %s -v 2>&1  | FileCheck -check-prefix 
CHECK-LTO-ARMPL %s

david-arm wrote:

Looks like `--target=aarch64-none-none` is needed for SLEEF and ArmPL perhaps? 
In the first 8 RUN lines it looks like we don't specify the target except for 
those cases.

https://github.com/llvm/llvm-project/pull/78749
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [LTO] Fix Veclib flags correctly pass to LTO flags (PR #78749)

2024-01-22 Thread David Sherwood via cfe-commits


@@ -783,6 +783,28 @@ void tools::addLTOOptions(const ToolChain &ToolChain, 
const ArgList &Args,
  "-generate-arange-section"));
   }
 
+  // Pass vector library arguments to LTO.
+  Arg *ArgVecLib = Args.getLastArg(options::OPT_fveclib);
+  if (ArgVecLib && ArgVecLib->getNumValues() == 1) {
+// Map the vector library names from clang front-end to opt front-end. The
+// values are taken from the TargetLibraryInfo class command line options.
+std::optional OptVal =
+llvm::StringSwitch>(ArgVecLib->getValue())

david-arm wrote:

Is it possible to refactor and reuse existing TargetLibraryInfo code, i.e. 
create a common static function that maps the values so that it can be called 
in multiple places?

https://github.com/llvm/llvm-project/pull/78749
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [LTO] Fix Veclib flags correctly pass to LTO flags (PR #78749)

2024-01-22 Thread David Sherwood via cfe-commits


@@ -783,6 +783,28 @@ void tools::addLTOOptions(const ToolChain &ToolChain, 
const ArgList &Args,
  "-generate-arange-section"));
   }
 
+  // Pass vector library arguments to LTO.
+  Arg *ArgVecLib = Args.getLastArg(options::OPT_fveclib);
+  if (ArgVecLib && ArgVecLib->getNumValues() == 1) {
+// Map the vector library names from clang front-end to opt front-end. The
+// values are taken from the TargetLibraryInfo class command line options.
+std::optional OptVal =
+llvm::StringSwitch>(ArgVecLib->getValue())

david-arm wrote:

Yes I think that would work, i.e. having a static function in 
TargetLibraryInfo.h that can be called in two places and doesn't have a 
dependency on the component/library. Having said that, I won't hold this patch 
up for this if it's too difficult!

https://github.com/llvm/llvm-project/pull/78749
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][AArch64] Add missing SME/SVE2.1 feature macros (PR #98285)

2024-07-12 Thread David Sherwood via cfe-commits

david-arm wrote:

Is it worth adding a link to the ACLE that describes the features in the commit 
message?

https://github.com/llvm/llvm-project/pull/98285
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [clang] [llvm] [LoopVectorize] Refine runtime memory check costs when there is an outer loop (PR #76034)

2024-01-08 Thread David Sherwood via cfe-commits

https://github.com/david-arm updated 
https://github.com/llvm/llvm-project/pull/76034

>From a4caa47dc8d2db75f6bb2ac3f880da4e1f6bea82 Mon Sep 17 00:00:00 2001
From: David Sherwood 
Date: Tue, 19 Dec 2023 16:07:33 +
Subject: [PATCH 1/2] Add tests showing runtime checks cost with low trip
 counts

---
 .../AArch64/low_trip_memcheck_cost.ll | 187 ++
 1 file changed, 187 insertions(+)
 create mode 100644 
llvm/test/Transforms/LoopVectorize/AArch64/low_trip_memcheck_cost.ll

diff --git 
a/llvm/test/Transforms/LoopVectorize/AArch64/low_trip_memcheck_cost.ll 
b/llvm/test/Transforms/LoopVectorize/AArch64/low_trip_memcheck_cost.ll
new file mode 100644
index 00..397521c2d3dc8f
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/low_trip_memcheck_cost.ll
@@ -0,0 +1,187 @@
+; REQUIRES: asserts
+; RUN: opt -p loop-vectorize -debug-only=loop-vectorize -S -disable-output < 
%s 2>&1 | FileCheck %s
+
+target triple = "aarch64-unknown-linux-gnu"
+
+
+define void @outer_no_tc(ptr nocapture noundef %a, ptr nocapture noundef 
readonly %b, i64 noundef %m, i64 noundef %n) {
+; CHECK-LABEL: LV: Checking a loop in 'outer_no_tc'
+; CHECK:  Calculating cost of runtime checks:
+; CHECK:  Total cost of runtime checks: 6
+; CHECK-NEXT: LV: Minimum required TC for runtime checks to be profitable:16
+entry:
+  br label %outer.loop
+
+outer.loop:
+  %outer.iv = phi i64 [ %outer.iv.next, %inner.exit ], [ 0, %entry ]
+  %mul.us = mul nsw i64 %outer.iv, %n
+  br label %inner.loop
+
+inner.loop:
+  %inner.iv = phi i64 [ 0, %outer.loop ], [ %inner.iv.next, %inner.loop ]
+  %add.us = add nuw nsw i64 %inner.iv, %mul.us
+  %arrayidx.us = getelementptr inbounds i8, ptr %b, i64 %add.us
+  %0 = load i8, ptr %arrayidx.us, align 1
+  %arrayidx7.us = getelementptr inbounds i8, ptr %a, i64 %add.us
+  %1 = load i8, ptr %arrayidx7.us, align 1
+  %add9.us = add i8 %1, %0
+  store i8 %add9.us, ptr %arrayidx7.us, align 1
+  %inner.iv.next = add nuw nsw i64 %inner.iv, 1
+  %exitcond.not = icmp eq i64 %inner.iv.next, %n
+  br i1 %exitcond.not, label %inner.exit, label %inner.loop
+
+inner.exit:
+  %outer.iv.next = add nuw nsw i64 %outer.iv, 1
+  %exitcond27.not = icmp eq i64 %outer.iv.next, %m
+  br i1 %exitcond27.not, label %outer.exit, label %outer.loop
+
+outer.exit:
+  ret void
+}
+
+
+define void @outer_known_tc3(ptr nocapture noundef %a, ptr nocapture noundef 
readonly %b, i64 noundef %n) {
+; CHECK-LABEL: LV: Checking a loop in 'outer_known_tc3'
+; CHECK:  Calculating cost of runtime checks:
+; CHECK:  Total cost of runtime checks: 6
+; CHECK-NEXT: LV: Minimum required TC for runtime checks to be profitable:16
+entry:
+  br label %outer.loop
+
+outer.loop:
+  %outer.iv = phi i64 [ %outer.iv.next, %inner.exit ], [ 0, %entry ]
+  %mul.us = mul nsw i64 %outer.iv, %n
+  br label %inner.loop
+
+inner.loop:
+  %inner.iv = phi i64 [ 0, %outer.loop ], [ %inner.iv.next, %inner.loop ]
+  %add.us = add nuw nsw i64 %inner.iv, %mul.us
+  %arrayidx.us = getelementptr inbounds i8, ptr %b, i64 %add.us
+  %0 = load i8, ptr %arrayidx.us, align 1
+  %arrayidx7.us = getelementptr inbounds i8, ptr %a, i64 %add.us
+  %1 = load i8, ptr %arrayidx7.us, align 1
+  %add9.us = add i8 %1, %0
+  store i8 %add9.us, ptr %arrayidx7.us, align 1
+  %inner.iv.next = add nuw nsw i64 %inner.iv, 1
+  %exitcond.not = icmp eq i64 %inner.iv.next, %n
+  br i1 %exitcond.not, label %inner.exit, label %inner.loop
+
+inner.exit:
+  %outer.iv.next = add nuw nsw i64 %outer.iv, 1
+  %exitcond26.not = icmp eq i64 %outer.iv.next, 3
+  br i1 %exitcond26.not, label %outer.exit, label %outer.loop
+
+outer.exit:
+  ret void
+}
+
+
+define void @outer_known_tc64(ptr nocapture noundef %a, ptr nocapture noundef 
readonly %b, i64 noundef %n) {
+; CHECK-LABEL: LV: Checking a loop in 'outer_known_tc64'
+; CHECK:  Calculating cost of runtime checks:
+; CHECK:  Total cost of runtime checks: 6
+; CHECK-NEXT: LV: Minimum required TC for runtime checks to be profitable:16
+entry:
+  br label %outer.loop
+
+outer.loop:
+  %outer.iv = phi i64 [ %outer.iv.next, %inner.exit ], [ 0, %entry ]
+  %mul.us = mul nsw i64 %outer.iv, %n
+  br label %inner.loop
+
+inner.loop:
+  %inner.iv = phi i64 [ 0, %outer.loop ], [ %inner.iv.next, %inner.loop ]
+  %add.us = add nuw nsw i64 %inner.iv, %mul.us
+  %arrayidx.us = getelementptr inbounds i8, ptr %b, i64 %add.us
+  %0 = load i8, ptr %arrayidx.us, align 1
+  %arrayidx7.us = getelementptr inbounds i8, ptr %a, i64 %add.us
+  %1 = load i8, ptr %arrayidx7.us, align 1
+  %add9.us = add i8 %1, %0
+  store i8 %add9.us, ptr %arrayidx7.us, align 1
+  %inner.iv.next = add nuw nsw i64 %inner.iv, 1
+  %exitcond.not = icmp eq i64 %inner.iv.next, %n
+  br i1 %exitcond.not, label %inner.exit, label %inner.loop
+
+inner.exit:
+  %outer.iv.next = add nuw nsw i64 %outer.iv, 1
+  %exitcond26.not = icmp eq i64 %outer.iv.next, 64
+  br i1 %exitcond26.not, label %outer.exit, label %outer.loop
+
+outer.exit:
+  ret void
+}
+
+
+defi

[clang] [llvm] [clang-tools-extra] [LoopVectorize] Refine runtime memory check costs when there is an outer loop (PR #76034)

2024-01-08 Thread David Sherwood via cfe-commits

https://github.com/david-arm updated 
https://github.com/llvm/llvm-project/pull/76034

>From a4caa47dc8d2db75f6bb2ac3f880da4e1f6bea82 Mon Sep 17 00:00:00 2001
From: David Sherwood 
Date: Tue, 19 Dec 2023 16:07:33 +
Subject: [PATCH 1/6] Add tests showing runtime checks cost with low trip
 counts

---
 .../AArch64/low_trip_memcheck_cost.ll | 187 ++
 1 file changed, 187 insertions(+)
 create mode 100644 
llvm/test/Transforms/LoopVectorize/AArch64/low_trip_memcheck_cost.ll

diff --git 
a/llvm/test/Transforms/LoopVectorize/AArch64/low_trip_memcheck_cost.ll 
b/llvm/test/Transforms/LoopVectorize/AArch64/low_trip_memcheck_cost.ll
new file mode 100644
index 00..397521c2d3dc8f
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/low_trip_memcheck_cost.ll
@@ -0,0 +1,187 @@
+; REQUIRES: asserts
+; RUN: opt -p loop-vectorize -debug-only=loop-vectorize -S -disable-output < 
%s 2>&1 | FileCheck %s
+
+target triple = "aarch64-unknown-linux-gnu"
+
+
+define void @outer_no_tc(ptr nocapture noundef %a, ptr nocapture noundef 
readonly %b, i64 noundef %m, i64 noundef %n) {
+; CHECK-LABEL: LV: Checking a loop in 'outer_no_tc'
+; CHECK:  Calculating cost of runtime checks:
+; CHECK:  Total cost of runtime checks: 6
+; CHECK-NEXT: LV: Minimum required TC for runtime checks to be profitable:16
+entry:
+  br label %outer.loop
+
+outer.loop:
+  %outer.iv = phi i64 [ %outer.iv.next, %inner.exit ], [ 0, %entry ]
+  %mul.us = mul nsw i64 %outer.iv, %n
+  br label %inner.loop
+
+inner.loop:
+  %inner.iv = phi i64 [ 0, %outer.loop ], [ %inner.iv.next, %inner.loop ]
+  %add.us = add nuw nsw i64 %inner.iv, %mul.us
+  %arrayidx.us = getelementptr inbounds i8, ptr %b, i64 %add.us
+  %0 = load i8, ptr %arrayidx.us, align 1
+  %arrayidx7.us = getelementptr inbounds i8, ptr %a, i64 %add.us
+  %1 = load i8, ptr %arrayidx7.us, align 1
+  %add9.us = add i8 %1, %0
+  store i8 %add9.us, ptr %arrayidx7.us, align 1
+  %inner.iv.next = add nuw nsw i64 %inner.iv, 1
+  %exitcond.not = icmp eq i64 %inner.iv.next, %n
+  br i1 %exitcond.not, label %inner.exit, label %inner.loop
+
+inner.exit:
+  %outer.iv.next = add nuw nsw i64 %outer.iv, 1
+  %exitcond27.not = icmp eq i64 %outer.iv.next, %m
+  br i1 %exitcond27.not, label %outer.exit, label %outer.loop
+
+outer.exit:
+  ret void
+}
+
+
+define void @outer_known_tc3(ptr nocapture noundef %a, ptr nocapture noundef 
readonly %b, i64 noundef %n) {
+; CHECK-LABEL: LV: Checking a loop in 'outer_known_tc3'
+; CHECK:  Calculating cost of runtime checks:
+; CHECK:  Total cost of runtime checks: 6
+; CHECK-NEXT: LV: Minimum required TC for runtime checks to be profitable:16
+entry:
+  br label %outer.loop
+
+outer.loop:
+  %outer.iv = phi i64 [ %outer.iv.next, %inner.exit ], [ 0, %entry ]
+  %mul.us = mul nsw i64 %outer.iv, %n
+  br label %inner.loop
+
+inner.loop:
+  %inner.iv = phi i64 [ 0, %outer.loop ], [ %inner.iv.next, %inner.loop ]
+  %add.us = add nuw nsw i64 %inner.iv, %mul.us
+  %arrayidx.us = getelementptr inbounds i8, ptr %b, i64 %add.us
+  %0 = load i8, ptr %arrayidx.us, align 1
+  %arrayidx7.us = getelementptr inbounds i8, ptr %a, i64 %add.us
+  %1 = load i8, ptr %arrayidx7.us, align 1
+  %add9.us = add i8 %1, %0
+  store i8 %add9.us, ptr %arrayidx7.us, align 1
+  %inner.iv.next = add nuw nsw i64 %inner.iv, 1
+  %exitcond.not = icmp eq i64 %inner.iv.next, %n
+  br i1 %exitcond.not, label %inner.exit, label %inner.loop
+
+inner.exit:
+  %outer.iv.next = add nuw nsw i64 %outer.iv, 1
+  %exitcond26.not = icmp eq i64 %outer.iv.next, 3
+  br i1 %exitcond26.not, label %outer.exit, label %outer.loop
+
+outer.exit:
+  ret void
+}
+
+
+define void @outer_known_tc64(ptr nocapture noundef %a, ptr nocapture noundef 
readonly %b, i64 noundef %n) {
+; CHECK-LABEL: LV: Checking a loop in 'outer_known_tc64'
+; CHECK:  Calculating cost of runtime checks:
+; CHECK:  Total cost of runtime checks: 6
+; CHECK-NEXT: LV: Minimum required TC for runtime checks to be profitable:16
+entry:
+  br label %outer.loop
+
+outer.loop:
+  %outer.iv = phi i64 [ %outer.iv.next, %inner.exit ], [ 0, %entry ]
+  %mul.us = mul nsw i64 %outer.iv, %n
+  br label %inner.loop
+
+inner.loop:
+  %inner.iv = phi i64 [ 0, %outer.loop ], [ %inner.iv.next, %inner.loop ]
+  %add.us = add nuw nsw i64 %inner.iv, %mul.us
+  %arrayidx.us = getelementptr inbounds i8, ptr %b, i64 %add.us
+  %0 = load i8, ptr %arrayidx.us, align 1
+  %arrayidx7.us = getelementptr inbounds i8, ptr %a, i64 %add.us
+  %1 = load i8, ptr %arrayidx7.us, align 1
+  %add9.us = add i8 %1, %0
+  store i8 %add9.us, ptr %arrayidx7.us, align 1
+  %inner.iv.next = add nuw nsw i64 %inner.iv, 1
+  %exitcond.not = icmp eq i64 %inner.iv.next, %n
+  br i1 %exitcond.not, label %inner.exit, label %inner.loop
+
+inner.exit:
+  %outer.iv.next = add nuw nsw i64 %outer.iv, 1
+  %exitcond26.not = icmp eq i64 %outer.iv.next, 64
+  br i1 %exitcond26.not, label %outer.exit, label %outer.loop
+
+outer.exit:
+  ret void
+}
+
+
+defi

[llvm] [clang] [clang-tools-extra] [AArch64] Add an AArch64 pass for loop idiom transformations (PR #72273)

2024-01-09 Thread David Sherwood via cfe-commits

https://github.com/david-arm closed 
https://github.com/llvm/llvm-project/pull/72273
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-tools-extra] [llvm] [AArch64] Add an AArch64 pass for loop idiom transformations (PR #72273)

2024-01-09 Thread David Sherwood via cfe-commits

david-arm wrote:

Hi @dyung, sorry about this! It passed for me locally. It sounds like it needs 
a REQUIRED aarch64-target somewhere then.

I'll try to fix it asap.


https://github.com/llvm/llvm-project/pull/72273
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [llvm] [clang] [AArch64] Add an AArch64 pass for loop idiom transformations (PR #72273)

2024-01-09 Thread David Sherwood via cfe-commits

david-arm wrote:

@dyung - fix pending here https://github.com/llvm/llvm-project/pull/77467

https://github.com/llvm/llvm-project/pull/72273
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang-tools-extra] [llvm] [AArch64] Add an AArch64 pass for loop idiom transformations (PR #72273)

2024-01-09 Thread David Sherwood via cfe-commits

david-arm wrote:

@dyung - fix pending here https://github.com/llvm/llvm-project/pull/77467

https://github.com/llvm/llvm-project/pull/72273
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [AArch64][SME] Fix multi vector cvt builtins (PR #77656)

2024-01-11 Thread David Sherwood via cfe-commits


@@ -34,118 +34,118 @@ define  
@multi_vector_cvt_x2_bf16( %unu
 ;
 ; FCVTZS
 ;
-define {, }  
@multi_vector_cvt_x2_f32_s32( %unused,  
%zn0,  %zn1) {
-; CHECK-LABEL: multi_vector_cvt_x2_f32_s32:
+define {, }  
@multi_vector_cvt_x2_s32_f32( %unused,  
%zn0,  %zn1) {
+; CHECK-LABEL: multi_vector_cvt_x2_s32_f32:
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:mov z3.d, z2.d
 ; CHECK-NEXT:mov z2.d, z1.d
 ; CHECK-NEXT:fcvtzs { z0.s, z1.s }, { z2.s, z3.s }
 ; CHECK-NEXT:ret
-  %res = call {, } 
@llvm.aarch64.sve.fcvts.x2.nxv4f32(%zn0, %zn1)
-  ret {, } %res
+  %res = call {, } 
@llvm.aarch64.sve.fcvts.x2.nxv4f32( %zn0,  %zn1)
+  ret {, } %res
 }
 
-define {, ,, 
}  @multi_vector_cvt_x4_f32_s32( %unused, 
 %zn0,  %zn1,  %zn2, 
 %zn3) {
-; CHECK-LABEL: multi_vector_cvt_x4_f32_s32:
+define {, ,, }  @multi_vector_cvt_x4_s32_f32( %unused,  %zn0,  %zn1,  %zn2,  %zn3) {
+; CHECK-LABEL: multi_vector_cvt_x4_s32_f32:
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:mov z7.d, z4.d
 ; CHECK-NEXT:mov z6.d, z3.d
 ; CHECK-NEXT:mov z5.d, z2.d
 ; CHECK-NEXT:mov z4.d, z1.d
 ; CHECK-NEXT:fcvtzs { z0.s - z3.s }, { z4.s - z7.s }
 ; CHECK-NEXT:ret
-  %res = call {, ,, } @llvm.aarch64.sve.fcvts.x4.nxv4f32(%zn0, %zn1, %zn2, %zn3)
-  ret {, , , 
} %res
+  %res = call {, ,, 
} @llvm.aarch64.sve.fcvts.x4.nxv4f32( 
%zn0,  %zn1,  %zn2,  %zn3)
+  ret {, , , } %res
 }
 
 ;
 ; FCVTZU
 ;
-define {, }  
@multi_vector_cvt_x2_f32_u32( %unused,  
%zn0,  %zn1) {
-; CHECK-LABEL: multi_vector_cvt_x2_f32_u32:
+define {, }  
@multi_vector_cvt_x2_u32_f32( %unused,  
%zn0,  %zn1) {
+; CHECK-LABEL: multi_vector_cvt_x2_u32_f32:
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:mov z3.d, z2.d
 ; CHECK-NEXT:mov z2.d, z1.d
 ; CHECK-NEXT:fcvtzu { z0.s, z1.s }, { z2.s, z3.s }
 ; CHECK-NEXT:ret
-  %res = call {, } 
@llvm.aarch64.sve.fcvtu.x2.nxv4f32(%zn0, %zn1)
-  ret {, } %res
+  %res = call {, } 
@llvm.aarch64.sve.fcvtu.x2.nxv4f32( %zn0,  %zn1)
+  ret {, } %res
 }
 
-define {, , , 
}  @multi_vector_cvt_x4_f32_u32( %unused, 
 %zn0,  %zn1,  %zn2, 
 %zn3) {
-; CHECK-LABEL: multi_vector_cvt_x4_f32_u32:
+define {, , , }  @multi_vector_cvt_x4_u32_f32( %unused,  %zn0,  %zn1,  %zn2,  %zn3) {
+; CHECK-LABEL: multi_vector_cvt_x4_u32_f32:
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:mov z7.d, z4.d
 ; CHECK-NEXT:mov z6.d, z3.d
 ; CHECK-NEXT:mov z5.d, z2.d
 ; CHECK-NEXT:mov z4.d, z1.d
 ; CHECK-NEXT:fcvtzu { z0.s - z3.s }, { z4.s - z7.s }
 ; CHECK-NEXT:ret
-  %res = call {, ,, } @llvm.aarch64.sve.fcvtu.x4.nxv4f32(%zn0, %zn1, %zn2, %zn3)
-  ret {, , , 
} %res
+  %res = call {, ,, 
} @llvm.aarch64.sve.fcvtu.x4.nxv4f32( 
%zn0,  %zn1,  %zn2,  %zn3)
+  ret {, , , } %res
 }
 
 ;
 ; SCVTF
 ;
-define {, }  
@multi_vector_cvt_x2_s32_f32(%unused,  
%zn0,  %zn1) {
-; CHECK-LABEL: multi_vector_cvt_x2_s32_f32:
+define {, }  
@multi_vector_cvt_x2_f32_s32( %unused,  
%zn0,  %zn1) {
+; CHECK-LABEL: multi_vector_cvt_x2_f32_s32:
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:mov z3.d, z2.d
 ; CHECK-NEXT:mov z2.d, z1.d
 ; CHECK-NEXT:scvtf { z0.s, z1.s }, { z2.s, z3.s }
 ; CHECK-NEXT:ret
-  %res = call {, } 
@llvm.aarch64.sve.scvtf.x2.nxv4f32(%zn0, %zn1)
-  ret {, } %res
+  %res = call {, } 
@llvm.aarch64.sve.scvtf.x2.nxv4i32( %zn0,  
%zn1)

david-arm wrote:

Shouldn't the intrinsic name be

`@llvm.aarch64.sve.scvtf.x2.nxv4f32`

because the intrinsics are all keyed off the floating point type, with bitcasts 
of the variable FP type to an integer type. I realise this does seem to work, 
but perhaps it's clearer to use the correct type.

https://github.com/llvm/llvm-project/pull/77656
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][SME2] Add builtins for multi-vector fp round to integral value (PR #75941)

2023-12-21 Thread David Sherwood via cfe-commits

https://github.com/david-arm approved this pull request.

LGTM. Absolute perfection!

https://github.com/llvm/llvm-project/pull/75941
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang-tools-extra] [clang] [Clang][SME2] Enable multi-vector loads & stores for SME2 (PR #75821)

2023-12-21 Thread David Sherwood via cfe-commits

https://github.com/david-arm approved this pull request.

LGTM! A lovely patch. :)

https://github.com/llvm/llvm-project/pull/75821
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang-tools-extra] [AArch64] Add an AArch64 pass for loop idiom transformations (PR #72273)

2023-12-22 Thread David Sherwood via cfe-commits


@@ -0,0 +1,816 @@
+//===- AArch64LoopIdiomTransform.cpp - Loop idiom recognition 
-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This pass implements a pass that recognizes certain loop idioms and
+// transforms them into more optimized versions of the same loop. In cases
+// where this happens, it can be a significant performance win.
+//
+// We currently only recognize one loop that finds the first mismatched byte
+// in an array and returns the index, i.e. something like:
+//
+//  while (++i != n) {
+//if (a[i] != b[i])
+//  break;
+//  }
+//
+// In this example we can actually vectorize the loop despite the early exit,
+// although the loop vectorizer does not support it. It requires some extra
+// checks to deal with the possibility of faulting loads when crossing page
+// boundaries. However, even with these checks it is still profitable to do the
+// transformation.
+//
+//===--===//
+//
+// TODO List:
+//
+// * When optimizing for code size we may want to avoid some transformations.
+// * We can also support the inverse case where we scan for a matching element.
+//
+//===--===//
+
+#include "AArch64LoopIdiomTransform.h"
+#include "llvm/Analysis/DomTreeUpdater.h"
+#include "llvm/Analysis/LoopPass.h"
+#include "llvm/Analysis/TargetTransformInfo.h"
+#include "llvm/IR/Dominators.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/Intrinsics.h"
+#include "llvm/IR/MDBuilder.h"
+#include "llvm/IR/PatternMatch.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Transforms/Utils/BasicBlockUtils.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "aarch64-loop-idiom-transform"
+
+static cl::opt
+DisableAll("disable-aarch64-lit-all", cl::Hidden, cl::init(false),
+   cl::desc("Disable AArch64 Loop Idiom Transform Pass."));
+
+static cl::opt DisableByteCmp(
+"disable-aarch64-lit-bytecmp", cl::Hidden, cl::init(false),
+cl::desc("Proceed with AArch64 Loop Idiom Transform Pass, but do "
+ "not convert byte-compare loop(s)."));
+
+static cl::opt VerifyLoops(
+"aarch64-lit-verify", cl::Hidden, cl::init(false),
+cl::desc("Verify loops generated AArch64 Loop Idiom Transform Pass."));
+
+namespace llvm {
+
+void initializeAArch64LoopIdiomTransformLegacyPassPass(PassRegistry &);
+Pass *createAArch64LoopIdiomTransformPass();
+
+} // end namespace llvm
+
+namespace {
+
+class AArch64LoopIdiomTransform {
+  Loop *CurLoop = nullptr;
+  DominatorTree *DT;
+  LoopInfo *LI;
+  const TargetTransformInfo *TTI;
+  const DataLayout *DL;
+
+public:
+  explicit AArch64LoopIdiomTransform(DominatorTree *DT, LoopInfo *LI,
+ const TargetTransformInfo *TTI,
+ const DataLayout *DL)
+  : DT(DT), LI(LI), TTI(TTI), DL(DL) {}
+
+  bool run(Loop *L);
+
+private:
+  /// \name Countable Loop Idiom Handling
+  /// @{
+
+  bool runOnCountableLoop();
+  bool runOnLoopBlock(BasicBlock *BB, const SCEV *BECount,
+  SmallVectorImpl &ExitBlocks);
+
+  bool recognizeByteCompare();
+  Value *expandFindMismatch(IRBuilder<> &Builder, GetElementPtrInst *GEPA,
+GetElementPtrInst *GEPB, Instruction *Index,
+Value *Start, Value *MaxLen);
+  void transformByteCompare(GetElementPtrInst *GEPA, GetElementPtrInst *GEPB,
+PHINode *IndPhi, Value *MaxLen, Instruction *Index,
+Value *Start, bool IncIdx, BasicBlock *FoundBB,
+BasicBlock *EndBB);
+  /// @}
+};
+
+class AArch64LoopIdiomTransformLegacyPass : public LoopPass {
+public:
+  static char ID;
+
+  explicit AArch64LoopIdiomTransformLegacyPass() : LoopPass(ID) {
+initializeAArch64LoopIdiomTransformLegacyPassPass(
+*PassRegistry::getPassRegistry());
+  }
+
+  StringRef getPassName() const override {
+return "Transform AArch64-specific loop idioms";
+  }
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override {
+AU.addRequired();
+AU.addRequired();
+AU.addRequired();
+  }
+
+  bool runOnLoop(Loop *L, LPPassManager &LPM) override;
+};
+
+bool AArch64LoopIdiomTransformLegacyPass::runOnLoop(Loop *L,
+LPPassManager &LPM) {
+
+  if (skipLoop(L))
+return false;
+
+  auto *DT = &getAnalysis().getDomTree();
+  auto *LI = &getAnalysis().getLoopInfo();
+  auto &TTI = getAnalysis().getTTI(
+  *L->getHeader()->getParent());
+  return AArch64LoopIdiomTransform(
+ DT, LI, &TTI, &L->getHeader()->getModule()->getDataLayout

[clang] [llvm] [clang-tools-extra] [AArch64] Add an AArch64 pass for loop idiom transformations (PR #72273)

2023-12-22 Thread David Sherwood via cfe-commits


@@ -0,0 +1,816 @@
+//===- AArch64LoopIdiomTransform.cpp - Loop idiom recognition 
-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This pass implements a pass that recognizes certain loop idioms and
+// transforms them into more optimized versions of the same loop. In cases
+// where this happens, it can be a significant performance win.
+//
+// We currently only recognize one loop that finds the first mismatched byte
+// in an array and returns the index, i.e. something like:
+//
+//  while (++i != n) {
+//if (a[i] != b[i])
+//  break;
+//  }
+//
+// In this example we can actually vectorize the loop despite the early exit,
+// although the loop vectorizer does not support it. It requires some extra
+// checks to deal with the possibility of faulting loads when crossing page
+// boundaries. However, even with these checks it is still profitable to do the
+// transformation.
+//
+//===--===//
+//
+// TODO List:
+//
+// * When optimizing for code size we may want to avoid some transformations.
+// * We can also support the inverse case where we scan for a matching element.
+//
+//===--===//
+
+#include "AArch64LoopIdiomTransform.h"
+#include "llvm/Analysis/DomTreeUpdater.h"
+#include "llvm/Analysis/LoopPass.h"
+#include "llvm/Analysis/TargetTransformInfo.h"
+#include "llvm/IR/Dominators.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/Intrinsics.h"
+#include "llvm/IR/MDBuilder.h"
+#include "llvm/IR/PatternMatch.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Transforms/Utils/BasicBlockUtils.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "aarch64-loop-idiom-transform"
+
+static cl::opt
+DisableAll("disable-aarch64-lit-all", cl::Hidden, cl::init(false),
+   cl::desc("Disable AArch64 Loop Idiom Transform Pass."));
+
+static cl::opt DisableByteCmp(
+"disable-aarch64-lit-bytecmp", cl::Hidden, cl::init(false),
+cl::desc("Proceed with AArch64 Loop Idiom Transform Pass, but do "
+ "not convert byte-compare loop(s)."));
+
+static cl::opt VerifyLoops(
+"aarch64-lit-verify", cl::Hidden, cl::init(false),
+cl::desc("Verify loops generated AArch64 Loop Idiom Transform Pass."));
+
+namespace llvm {
+
+void initializeAArch64LoopIdiomTransformLegacyPassPass(PassRegistry &);
+Pass *createAArch64LoopIdiomTransformPass();
+
+} // end namespace llvm
+
+namespace {
+
+class AArch64LoopIdiomTransform {
+  Loop *CurLoop = nullptr;
+  DominatorTree *DT;
+  LoopInfo *LI;
+  const TargetTransformInfo *TTI;
+  const DataLayout *DL;
+
+public:
+  explicit AArch64LoopIdiomTransform(DominatorTree *DT, LoopInfo *LI,
+ const TargetTransformInfo *TTI,
+ const DataLayout *DL)
+  : DT(DT), LI(LI), TTI(TTI), DL(DL) {}
+
+  bool run(Loop *L);
+
+private:
+  /// \name Countable Loop Idiom Handling
+  /// @{
+
+  bool runOnCountableLoop();
+  bool runOnLoopBlock(BasicBlock *BB, const SCEV *BECount,
+  SmallVectorImpl &ExitBlocks);
+
+  bool recognizeByteCompare();
+  Value *expandFindMismatch(IRBuilder<> &Builder, GetElementPtrInst *GEPA,
+GetElementPtrInst *GEPB, Instruction *Index,
+Value *Start, Value *MaxLen);
+  void transformByteCompare(GetElementPtrInst *GEPA, GetElementPtrInst *GEPB,
+PHINode *IndPhi, Value *MaxLen, Instruction *Index,
+Value *Start, bool IncIdx, BasicBlock *FoundBB,
+BasicBlock *EndBB);
+  /// @}
+};
+
+class AArch64LoopIdiomTransformLegacyPass : public LoopPass {
+public:
+  static char ID;
+
+  explicit AArch64LoopIdiomTransformLegacyPass() : LoopPass(ID) {
+initializeAArch64LoopIdiomTransformLegacyPassPass(
+*PassRegistry::getPassRegistry());
+  }
+
+  StringRef getPassName() const override {
+return "Transform AArch64-specific loop idioms";
+  }
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override {
+AU.addRequired();
+AU.addRequired();
+AU.addRequired();
+  }
+
+  bool runOnLoop(Loop *L, LPPassManager &LPM) override;
+};
+
+bool AArch64LoopIdiomTransformLegacyPass::runOnLoop(Loop *L,
+LPPassManager &LPM) {
+
+  if (skipLoop(L))
+return false;
+
+  auto *DT = &getAnalysis().getDomTree();
+  auto *LI = &getAnalysis().getLoopInfo();
+  auto &TTI = getAnalysis().getTTI(
+  *L->getHeader()->getParent());
+  return AArch64LoopIdiomTransform(
+ DT, LI, &TTI, &L->getHeader()->getModule()->getDataLayout

[clang] [llvm] [clang-tools-extra] [AArch64] Add an AArch64 pass for loop idiom transformations (PR #72273)

2023-12-22 Thread David Sherwood via cfe-commits


@@ -0,0 +1,816 @@
+//===- AArch64LoopIdiomTransform.cpp - Loop idiom recognition 
-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This pass implements a pass that recognizes certain loop idioms and
+// transforms them into more optimized versions of the same loop. In cases
+// where this happens, it can be a significant performance win.
+//
+// We currently only recognize one loop that finds the first mismatched byte
+// in an array and returns the index, i.e. something like:
+//
+//  while (++i != n) {
+//if (a[i] != b[i])
+//  break;
+//  }
+//
+// In this example we can actually vectorize the loop despite the early exit,
+// although the loop vectorizer does not support it. It requires some extra
+// checks to deal with the possibility of faulting loads when crossing page
+// boundaries. However, even with these checks it is still profitable to do the
+// transformation.
+//
+//===--===//
+//
+// TODO List:
+//
+// * When optimizing for code size we may want to avoid some transformations.
+// * We can also support the inverse case where we scan for a matching element.
+//
+//===--===//
+
+#include "AArch64LoopIdiomTransform.h"
+#include "llvm/Analysis/DomTreeUpdater.h"
+#include "llvm/Analysis/LoopPass.h"
+#include "llvm/Analysis/TargetTransformInfo.h"
+#include "llvm/IR/Dominators.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/Intrinsics.h"
+#include "llvm/IR/MDBuilder.h"
+#include "llvm/IR/PatternMatch.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Transforms/Utils/BasicBlockUtils.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "aarch64-loop-idiom-transform"
+
+static cl::opt
+DisableAll("disable-aarch64-lit-all", cl::Hidden, cl::init(false),
+   cl::desc("Disable AArch64 Loop Idiom Transform Pass."));
+
+static cl::opt DisableByteCmp(
+"disable-aarch64-lit-bytecmp", cl::Hidden, cl::init(false),
+cl::desc("Proceed with AArch64 Loop Idiom Transform Pass, but do "
+ "not convert byte-compare loop(s)."));
+
+static cl::opt VerifyLoops(
+"aarch64-lit-verify", cl::Hidden, cl::init(false),
+cl::desc("Verify loops generated AArch64 Loop Idiom Transform Pass."));
+
+namespace llvm {
+
+void initializeAArch64LoopIdiomTransformLegacyPassPass(PassRegistry &);
+Pass *createAArch64LoopIdiomTransformPass();
+
+} // end namespace llvm
+
+namespace {
+
+class AArch64LoopIdiomTransform {
+  Loop *CurLoop = nullptr;
+  DominatorTree *DT;
+  LoopInfo *LI;
+  const TargetTransformInfo *TTI;
+  const DataLayout *DL;
+
+public:
+  explicit AArch64LoopIdiomTransform(DominatorTree *DT, LoopInfo *LI,
+ const TargetTransformInfo *TTI,
+ const DataLayout *DL)
+  : DT(DT), LI(LI), TTI(TTI), DL(DL) {}
+
+  bool run(Loop *L);
+
+private:
+  /// \name Countable Loop Idiom Handling
+  /// @{
+
+  bool runOnCountableLoop();
+  bool runOnLoopBlock(BasicBlock *BB, const SCEV *BECount,
+  SmallVectorImpl &ExitBlocks);
+
+  bool recognizeByteCompare();
+  Value *expandFindMismatch(IRBuilder<> &Builder, GetElementPtrInst *GEPA,
+GetElementPtrInst *GEPB, Instruction *Index,
+Value *Start, Value *MaxLen);
+  void transformByteCompare(GetElementPtrInst *GEPA, GetElementPtrInst *GEPB,
+PHINode *IndPhi, Value *MaxLen, Instruction *Index,
+Value *Start, bool IncIdx, BasicBlock *FoundBB,
+BasicBlock *EndBB);
+  /// @}
+};
+
+class AArch64LoopIdiomTransformLegacyPass : public LoopPass {
+public:
+  static char ID;
+
+  explicit AArch64LoopIdiomTransformLegacyPass() : LoopPass(ID) {
+initializeAArch64LoopIdiomTransformLegacyPassPass(
+*PassRegistry::getPassRegistry());
+  }
+
+  StringRef getPassName() const override {
+return "Transform AArch64-specific loop idioms";
+  }
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override {
+AU.addRequired();
+AU.addRequired();
+AU.addRequired();
+  }
+
+  bool runOnLoop(Loop *L, LPPassManager &LPM) override;
+};
+
+bool AArch64LoopIdiomTransformLegacyPass::runOnLoop(Loop *L,
+LPPassManager &LPM) {
+
+  if (skipLoop(L))
+return false;
+
+  auto *DT = &getAnalysis().getDomTree();
+  auto *LI = &getAnalysis().getLoopInfo();
+  auto &TTI = getAnalysis().getTTI(
+  *L->getHeader()->getParent());
+  return AArch64LoopIdiomTransform(
+ DT, LI, &TTI, &L->getHeader()->getModule()->getDataLayout

[clang-tools-extra] [llvm] [clang] [AArch64] Add an AArch64 pass for loop idiom transformations (PR #72273)

2023-12-22 Thread David Sherwood via cfe-commits


@@ -0,0 +1,816 @@
+//===- AArch64LoopIdiomTransform.cpp - Loop idiom recognition 
-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This pass implements a pass that recognizes certain loop idioms and
+// transforms them into more optimized versions of the same loop. In cases
+// where this happens, it can be a significant performance win.
+//
+// We currently only recognize one loop that finds the first mismatched byte
+// in an array and returns the index, i.e. something like:
+//
+//  while (++i != n) {
+//if (a[i] != b[i])
+//  break;
+//  }
+//
+// In this example we can actually vectorize the loop despite the early exit,
+// although the loop vectorizer does not support it. It requires some extra
+// checks to deal with the possibility of faulting loads when crossing page
+// boundaries. However, even with these checks it is still profitable to do the
+// transformation.
+//
+//===--===//
+//
+// TODO List:
+//
+// * When optimizing for code size we may want to avoid some transformations.
+// * We can also support the inverse case where we scan for a matching element.
+//
+//===--===//
+
+#include "AArch64LoopIdiomTransform.h"
+#include "llvm/Analysis/DomTreeUpdater.h"
+#include "llvm/Analysis/LoopPass.h"
+#include "llvm/Analysis/TargetTransformInfo.h"
+#include "llvm/IR/Dominators.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/Intrinsics.h"
+#include "llvm/IR/MDBuilder.h"
+#include "llvm/IR/PatternMatch.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Transforms/Utils/BasicBlockUtils.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "aarch64-loop-idiom-transform"
+
+static cl::opt
+DisableAll("disable-aarch64-lit-all", cl::Hidden, cl::init(false),
+   cl::desc("Disable AArch64 Loop Idiom Transform Pass."));
+
+static cl::opt DisableByteCmp(
+"disable-aarch64-lit-bytecmp", cl::Hidden, cl::init(false),
+cl::desc("Proceed with AArch64 Loop Idiom Transform Pass, but do "
+ "not convert byte-compare loop(s)."));
+
+static cl::opt VerifyLoops(
+"aarch64-lit-verify", cl::Hidden, cl::init(false),
+cl::desc("Verify loops generated AArch64 Loop Idiom Transform Pass."));
+
+namespace llvm {
+
+void initializeAArch64LoopIdiomTransformLegacyPassPass(PassRegistry &);
+Pass *createAArch64LoopIdiomTransformPass();
+
+} // end namespace llvm
+
+namespace {
+
+class AArch64LoopIdiomTransform {
+  Loop *CurLoop = nullptr;
+  DominatorTree *DT;
+  LoopInfo *LI;
+  const TargetTransformInfo *TTI;
+  const DataLayout *DL;
+
+public:
+  explicit AArch64LoopIdiomTransform(DominatorTree *DT, LoopInfo *LI,
+ const TargetTransformInfo *TTI,
+ const DataLayout *DL)
+  : DT(DT), LI(LI), TTI(TTI), DL(DL) {}
+
+  bool run(Loop *L);
+
+private:
+  /// \name Countable Loop Idiom Handling
+  /// @{
+
+  bool runOnCountableLoop();
+  bool runOnLoopBlock(BasicBlock *BB, const SCEV *BECount,
+  SmallVectorImpl &ExitBlocks);
+
+  bool recognizeByteCompare();
+  Value *expandFindMismatch(IRBuilder<> &Builder, GetElementPtrInst *GEPA,
+GetElementPtrInst *GEPB, Instruction *Index,
+Value *Start, Value *MaxLen);
+  void transformByteCompare(GetElementPtrInst *GEPA, GetElementPtrInst *GEPB,
+PHINode *IndPhi, Value *MaxLen, Instruction *Index,
+Value *Start, bool IncIdx, BasicBlock *FoundBB,
+BasicBlock *EndBB);
+  /// @}
+};
+
+class AArch64LoopIdiomTransformLegacyPass : public LoopPass {
+public:
+  static char ID;
+
+  explicit AArch64LoopIdiomTransformLegacyPass() : LoopPass(ID) {
+initializeAArch64LoopIdiomTransformLegacyPassPass(
+*PassRegistry::getPassRegistry());
+  }
+
+  StringRef getPassName() const override {
+return "Transform AArch64-specific loop idioms";
+  }
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override {
+AU.addRequired();
+AU.addRequired();
+AU.addRequired();
+  }
+
+  bool runOnLoop(Loop *L, LPPassManager &LPM) override;
+};
+
+bool AArch64LoopIdiomTransformLegacyPass::runOnLoop(Loop *L,
+LPPassManager &LPM) {
+
+  if (skipLoop(L))
+return false;
+
+  auto *DT = &getAnalysis().getDomTree();
+  auto *LI = &getAnalysis().getLoopInfo();
+  auto &TTI = getAnalysis().getTTI(
+  *L->getHeader()->getParent());
+  return AArch64LoopIdiomTransform(
+ DT, LI, &TTI, &L->getHeader()->getModule()->getDataLayout

[clang-tools-extra] [llvm] [clang] [AArch64] Add an AArch64 pass for loop idiom transformations (PR #72273)

2023-12-22 Thread David Sherwood via cfe-commits


@@ -0,0 +1,816 @@
+//===- AArch64LoopIdiomTransform.cpp - Loop idiom recognition 
-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This pass implements a pass that recognizes certain loop idioms and
+// transforms them into more optimized versions of the same loop. In cases
+// where this happens, it can be a significant performance win.
+//
+// We currently only recognize one loop that finds the first mismatched byte
+// in an array and returns the index, i.e. something like:
+//
+//  while (++i != n) {
+//if (a[i] != b[i])
+//  break;
+//  }
+//
+// In this example we can actually vectorize the loop despite the early exit,
+// although the loop vectorizer does not support it. It requires some extra
+// checks to deal with the possibility of faulting loads when crossing page
+// boundaries. However, even with these checks it is still profitable to do the
+// transformation.
+//
+//===--===//
+//
+// TODO List:
+//
+// * When optimizing for code size we may want to avoid some transformations.
+// * We can also support the inverse case where we scan for a matching element.
+//
+//===--===//
+
+#include "AArch64LoopIdiomTransform.h"
+#include "llvm/Analysis/DomTreeUpdater.h"
+#include "llvm/Analysis/LoopPass.h"
+#include "llvm/Analysis/TargetTransformInfo.h"
+#include "llvm/IR/Dominators.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/Intrinsics.h"
+#include "llvm/IR/MDBuilder.h"
+#include "llvm/IR/PatternMatch.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Transforms/Utils/BasicBlockUtils.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "aarch64-loop-idiom-transform"
+
+static cl::opt
+DisableAll("disable-aarch64-lit-all", cl::Hidden, cl::init(false),
+   cl::desc("Disable AArch64 Loop Idiom Transform Pass."));
+
+static cl::opt DisableByteCmp(
+"disable-aarch64-lit-bytecmp", cl::Hidden, cl::init(false),
+cl::desc("Proceed with AArch64 Loop Idiom Transform Pass, but do "
+ "not convert byte-compare loop(s)."));
+
+static cl::opt VerifyLoops(
+"aarch64-lit-verify", cl::Hidden, cl::init(false),
+cl::desc("Verify loops generated AArch64 Loop Idiom Transform Pass."));
+
+namespace llvm {
+
+void initializeAArch64LoopIdiomTransformLegacyPassPass(PassRegistry &);
+Pass *createAArch64LoopIdiomTransformPass();
+
+} // end namespace llvm
+
+namespace {
+
+class AArch64LoopIdiomTransform {
+  Loop *CurLoop = nullptr;
+  DominatorTree *DT;
+  LoopInfo *LI;
+  const TargetTransformInfo *TTI;
+  const DataLayout *DL;
+
+public:
+  explicit AArch64LoopIdiomTransform(DominatorTree *DT, LoopInfo *LI,
+ const TargetTransformInfo *TTI,
+ const DataLayout *DL)
+  : DT(DT), LI(LI), TTI(TTI), DL(DL) {}
+
+  bool run(Loop *L);
+
+private:
+  /// \name Countable Loop Idiom Handling
+  /// @{
+
+  bool runOnCountableLoop();
+  bool runOnLoopBlock(BasicBlock *BB, const SCEV *BECount,
+  SmallVectorImpl &ExitBlocks);
+
+  bool recognizeByteCompare();
+  Value *expandFindMismatch(IRBuilder<> &Builder, GetElementPtrInst *GEPA,
+GetElementPtrInst *GEPB, Instruction *Index,
+Value *Start, Value *MaxLen);
+  void transformByteCompare(GetElementPtrInst *GEPA, GetElementPtrInst *GEPB,
+PHINode *IndPhi, Value *MaxLen, Instruction *Index,
+Value *Start, bool IncIdx, BasicBlock *FoundBB,
+BasicBlock *EndBB);
+  /// @}
+};
+
+class AArch64LoopIdiomTransformLegacyPass : public LoopPass {
+public:
+  static char ID;
+
+  explicit AArch64LoopIdiomTransformLegacyPass() : LoopPass(ID) {
+initializeAArch64LoopIdiomTransformLegacyPassPass(
+*PassRegistry::getPassRegistry());
+  }
+
+  StringRef getPassName() const override {
+return "Transform AArch64-specific loop idioms";
+  }
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override {
+AU.addRequired();
+AU.addRequired();
+AU.addRequired();
+  }
+
+  bool runOnLoop(Loop *L, LPPassManager &LPM) override;
+};
+
+bool AArch64LoopIdiomTransformLegacyPass::runOnLoop(Loop *L,
+LPPassManager &LPM) {
+
+  if (skipLoop(L))
+return false;
+
+  auto *DT = &getAnalysis().getDomTree();
+  auto *LI = &getAnalysis().getLoopInfo();
+  auto &TTI = getAnalysis().getTTI(
+  *L->getHeader()->getParent());
+  return AArch64LoopIdiomTransform(
+ DT, LI, &TTI, &L->getHeader()->getModule()->getDataLayout

[llvm] [clang] [clang-tools-extra] [AArch64] Add an AArch64 pass for loop idiom transformations (PR #72273)

2023-12-22 Thread David Sherwood via cfe-commits


@@ -0,0 +1,816 @@
+//===- AArch64LoopIdiomTransform.cpp - Loop idiom recognition 
-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This pass implements a pass that recognizes certain loop idioms and
+// transforms them into more optimized versions of the same loop. In cases
+// where this happens, it can be a significant performance win.
+//
+// We currently only recognize one loop that finds the first mismatched byte
+// in an array and returns the index, i.e. something like:
+//
+//  while (++i != n) {
+//if (a[i] != b[i])
+//  break;
+//  }
+//
+// In this example we can actually vectorize the loop despite the early exit,
+// although the loop vectorizer does not support it. It requires some extra
+// checks to deal with the possibility of faulting loads when crossing page
+// boundaries. However, even with these checks it is still profitable to do the
+// transformation.
+//
+//===--===//
+//
+// TODO List:
+//
+// * When optimizing for code size we may want to avoid some transformations.
+// * We can also support the inverse case where we scan for a matching element.
+//
+//===--===//
+
+#include "AArch64LoopIdiomTransform.h"
+#include "llvm/Analysis/DomTreeUpdater.h"
+#include "llvm/Analysis/LoopPass.h"
+#include "llvm/Analysis/TargetTransformInfo.h"
+#include "llvm/IR/Dominators.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/Intrinsics.h"
+#include "llvm/IR/MDBuilder.h"
+#include "llvm/IR/PatternMatch.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Transforms/Utils/BasicBlockUtils.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "aarch64-loop-idiom-transform"
+
+static cl::opt
+DisableAll("disable-aarch64-lit-all", cl::Hidden, cl::init(false),
+   cl::desc("Disable AArch64 Loop Idiom Transform Pass."));
+
+static cl::opt DisableByteCmp(
+"disable-aarch64-lit-bytecmp", cl::Hidden, cl::init(false),
+cl::desc("Proceed with AArch64 Loop Idiom Transform Pass, but do "
+ "not convert byte-compare loop(s)."));
+
+static cl::opt VerifyLoops(
+"aarch64-lit-verify", cl::Hidden, cl::init(false),
+cl::desc("Verify loops generated AArch64 Loop Idiom Transform Pass."));
+
+namespace llvm {
+
+void initializeAArch64LoopIdiomTransformLegacyPassPass(PassRegistry &);
+Pass *createAArch64LoopIdiomTransformPass();
+
+} // end namespace llvm
+
+namespace {
+
+class AArch64LoopIdiomTransform {
+  Loop *CurLoop = nullptr;
+  DominatorTree *DT;
+  LoopInfo *LI;
+  const TargetTransformInfo *TTI;
+  const DataLayout *DL;
+
+public:
+  explicit AArch64LoopIdiomTransform(DominatorTree *DT, LoopInfo *LI,
+ const TargetTransformInfo *TTI,
+ const DataLayout *DL)
+  : DT(DT), LI(LI), TTI(TTI), DL(DL) {}
+
+  bool run(Loop *L);
+
+private:
+  /// \name Countable Loop Idiom Handling
+  /// @{
+
+  bool runOnCountableLoop();
+  bool runOnLoopBlock(BasicBlock *BB, const SCEV *BECount,
+  SmallVectorImpl &ExitBlocks);
+
+  bool recognizeByteCompare();
+  Value *expandFindMismatch(IRBuilder<> &Builder, GetElementPtrInst *GEPA,
+GetElementPtrInst *GEPB, Instruction *Index,
+Value *Start, Value *MaxLen);
+  void transformByteCompare(GetElementPtrInst *GEPA, GetElementPtrInst *GEPB,
+PHINode *IndPhi, Value *MaxLen, Instruction *Index,
+Value *Start, bool IncIdx, BasicBlock *FoundBB,
+BasicBlock *EndBB);
+  /// @}
+};
+
+class AArch64LoopIdiomTransformLegacyPass : public LoopPass {
+public:
+  static char ID;
+
+  explicit AArch64LoopIdiomTransformLegacyPass() : LoopPass(ID) {
+initializeAArch64LoopIdiomTransformLegacyPassPass(
+*PassRegistry::getPassRegistry());
+  }
+
+  StringRef getPassName() const override {
+return "Transform AArch64-specific loop idioms";
+  }
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override {
+AU.addRequired();
+AU.addRequired();
+AU.addRequired();
+  }
+
+  bool runOnLoop(Loop *L, LPPassManager &LPM) override;
+};
+
+bool AArch64LoopIdiomTransformLegacyPass::runOnLoop(Loop *L,
+LPPassManager &LPM) {
+
+  if (skipLoop(L))
+return false;
+
+  auto *DT = &getAnalysis().getDomTree();
+  auto *LI = &getAnalysis().getLoopInfo();
+  auto &TTI = getAnalysis().getTTI(
+  *L->getHeader()->getParent());
+  return AArch64LoopIdiomTransform(
+ DT, LI, &TTI, &L->getHeader()->getModule()->getDataLayout

[clang] [AArch64][Clang] Refactor code to emit SVE & SME builtins (PR #70959)

2023-11-02 Thread David Sherwood via cfe-commits

https://github.com/david-arm approved this pull request.

LGTM!

https://github.com/llvm/llvm-project/pull/70959
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)

2023-11-03 Thread David Sherwood via cfe-commits

https://github.com/david-arm commented:

Thanks for this! I've not done an exhaustive review, but I'll leave the 
comments I have so far.

https://github.com/llvm/llvm-project/pull/70474
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)

2023-11-03 Thread David Sherwood via cfe-commits

https://github.com/david-arm edited 
https://github.com/llvm/llvm-project/pull/70474
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)

2023-11-03 Thread David Sherwood via cfe-commits


@@ -9702,17 +9727,34 @@ Value *CodeGenFunction::EmitSVEMaskedStore(const 
CallExpr *E,
   auto VectorTy = cast(Ops.back()->getType());
   auto MemoryTy = llvm::ScalableVectorType::get(MemEltTy, VectorTy);
 
-  Value *Predicate = EmitSVEPredicateCast(Ops[0], MemoryTy);
+  auto PredTy = MemoryTy;
+  auto AddrMemoryTy = MemoryTy;
+  bool IsTruncatingStore = true;

david-arm wrote:

Same comment as in EmitSVEMaskedLoad. Perhaps better just to have a IsQuadStore 
boolean, since it's an exceptional case and unlikely to have commonality with 
other instructions?

https://github.com/llvm/llvm-project/pull/70474
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)

2023-11-03 Thread David Sherwood via cfe-commits


@@ -2614,6 +2619,37 @@ def int_aarch64_sve_ld1_pn_x4 : 
SVE2p1_Load_PN_X4_Intrinsic;
 def int_aarch64_sve_ldnt1_pn_x2 : SVE2p1_Load_PN_X2_Intrinsic;
 def int_aarch64_sve_ldnt1_pn_x4 : SVE2p1_Load_PN_X4_Intrinsic;
 
+//
+// SVE2.1 - Contiguous loads to quadword (single vector)
+//
+
+class SVE2p1_Single_Load_Quadword
+: DefaultAttrsIntrinsic<[llvm_anyvector_ty],
+[llvm_nxv1i1_ty, llvm_ptr_ty],
+[IntrReadMem]>;

david-arm wrote:

I think this should also have IntrArgMemOnly too, similar to 
AdvSIMD_1Vec_Load_Intrinsic.

https://github.com/llvm/llvm-project/pull/70474
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)

2023-11-03 Thread David Sherwood via cfe-commits


@@ -9671,28 +9677,47 @@ Value *CodeGenFunction::EmitSVEMaskedLoad(const 
CallExpr *E,
   // The vector type that is returned may be different from the
   // eventual type loaded from memory.
   auto VectorTy = cast(ReturnTy);
-  auto MemoryTy = llvm::ScalableVectorType::get(MemEltTy, VectorTy);
+  llvm::ScalableVectorType *MemoryTy = nullptr;
+  llvm::ScalableVectorType *PredTy = nullptr;
+  bool IsExtendingLoad = true;

david-arm wrote:

I personally think using this variable is misleading because aarch64_sve_ld1uwq 
is actually an extending load - we're extending from 32-bit memory elements to 
128-bit integer elements. So it looks odd when we set this to false. Perhaps 
it's better to just explicitly have a variable called `IsQuadLoad` and use that 
instead rather than try to generalise this. The quad-word loads are a really 
just an exception here because we're working around the lack of a  type. So you'd have something like

  case Intrinsic::aarch64_sve_ld1uwq:
IsQuadLoad = true;
...
  default:
IsQuadLoad = false;


  Function *F =
  CGM.getIntrinsic(IntrinsicID, IsQuadLoad ? VectorTy : MemoryTy);\

...

  if (IsQuadLoad)
return Load;

https://github.com/llvm/llvm-project/pull/70474
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)

2023-11-03 Thread David Sherwood via cfe-commits


@@ -9702,17 +9727,34 @@ Value *CodeGenFunction::EmitSVEMaskedStore(const 
CallExpr *E,
   auto VectorTy = cast(Ops.back()->getType());
   auto MemoryTy = llvm::ScalableVectorType::get(MemEltTy, VectorTy);
 
-  Value *Predicate = EmitSVEPredicateCast(Ops[0], MemoryTy);
+  auto PredTy = MemoryTy;
+  auto AddrMemoryTy = MemoryTy;
+  bool IsTruncatingStore = true;
+  ;

david-arm wrote:

Extra ; here

https://github.com/llvm/llvm-project/pull/70474
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)

2023-11-03 Thread David Sherwood via cfe-commits


@@ -9671,28 +9677,47 @@ Value *CodeGenFunction::EmitSVEMaskedLoad(const 
CallExpr *E,
   // The vector type that is returned may be different from the
   // eventual type loaded from memory.
   auto VectorTy = cast(ReturnTy);
-  auto MemoryTy = llvm::ScalableVectorType::get(MemEltTy, VectorTy);
+  llvm::ScalableVectorType *MemoryTy = nullptr;
+  llvm::ScalableVectorType *PredTy = nullptr;
+  bool IsExtendingLoad = true;
+  switch (IntrinsicID) {
+  case Intrinsic::aarch64_sve_ld1uwq:
+  case Intrinsic::aarch64_sve_ld1udq:
+MemoryTy = llvm::ScalableVectorType::get(MemEltTy, 1);
+PredTy =
+llvm::ScalableVectorType::get(IntegerType::get(getLLVMContext(), 1), 
1);

david-arm wrote:

You can just do 
llvm::ScalableVectorType::get(Type::getInt1Ty(getLLVMContext()), 1);

https://github.com/llvm/llvm-project/pull/70474
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)

2023-11-03 Thread David Sherwood via cfe-commits


@@ -2614,6 +2619,37 @@ def int_aarch64_sve_ld1_pn_x4 : 
SVE2p1_Load_PN_X4_Intrinsic;
 def int_aarch64_sve_ldnt1_pn_x2 : SVE2p1_Load_PN_X2_Intrinsic;
 def int_aarch64_sve_ldnt1_pn_x4 : SVE2p1_Load_PN_X4_Intrinsic;
 
+//
+// SVE2.1 - Contiguous loads to quadword (single vector)
+//
+
+class SVE2p1_Single_Load_Quadword
+: DefaultAttrsIntrinsic<[llvm_anyvector_ty],
+[llvm_nxv1i1_ty, llvm_ptr_ty],
+[IntrReadMem]>;
+def int_aarch64_sve_ld1uwq : SVE2p1_Single_Load_Quadword;
+def int_aarch64_sve_ld1udq : SVE2p1_Single_Load_Quadword;
+
+//
+// SVE2.1 - Contiguous store from quadword (single vector)
+//
+
+class SVE2p1_Single_Store_Quadword
+: DefaultAttrsIntrinsic<[],
+[llvm_anyvector_ty, llvm_nxv1i1_ty, llvm_ptr_ty],
+[IntrArgMemOnly]>;

david-arm wrote:

This also needs the IntrWriteMem flag otherwise we could end up incorrectly 
rescheduling stores in the wrong place.

https://github.com/llvm/llvm-project/pull/70474
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [AArch64] Cast predicate operand of SVE gather loads/scater stores to the parameter type of the intrinsic (NFC) (PR #71289)

2023-11-06 Thread David Sherwood via cfe-commits

https://github.com/david-arm approved this pull request.

LGTM!

https://github.com/llvm/llvm-project/pull/71289
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)

2023-11-20 Thread David Sherwood via cfe-commits

https://github.com/david-arm approved this pull request.

LGTM. I think I would have preferred the patch to be split up into 3 - one for 
contiguous extending loads/truncating stores, one for structured loads/stores, 
and one for the gathers. That's why it took me so long to review this patch as 
I was constantly trying to keep all the information about each 
builtin/instruction in my head whilst reviewing the tests for correctness!

https://github.com/llvm/llvm-project/pull/70474
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Emit TBAA info for enums in C (PR #73326)

2023-11-24 Thread David Sherwood via cfe-commits

https://github.com/david-arm created 
https://github.com/llvm/llvm-project/pull/73326

When emitting TBAA information for enums in C code we currently just treat the 
data as an 'omnipotent char'. However, with C strict aliasing this means we 
fail to optimise certain cases. For example, in the SPEC2017 xz benchmark there 
are structs that contain arrays of enums, and clang pessmistically assumes that 
accesses to those enums could alias with other struct members that have a 
different type.

According to

https://en.cppreference.com/w/c/language/enum

enums should be treated as 'int' types unless
explicitly specified (C23) or if 'int' would not be large enough to hold all 
the enumerated values. In the latter case the compiler is free to choose a 
suitable integer that would hold all such values.

When compiling C code this patch generates TBAA
information for the enum by using an equivalent integer of the size clang has 
already chosen for the enum. I have ignored C++ for now because the rules are 
more complex.

New test added here:

  clang/test/CodeGen/tbaa.c

>From af76f6b6b3469fd0f5f24427c5a175c8d9d7c83a Mon Sep 17 00:00:00 2001
From: David Sherwood 
Date: Fri, 24 Nov 2023 13:20:23 +
Subject: [PATCH] [Clang] Emit TBAA info for enums in C

When emitting TBAA information for enums in C code we
currently just treat the data as an 'omnipotent char'.
However, with C strict aliasing this means we fail to
optimise certain cases. For example, in the SPEC2017
xz benchmark there are structs that contain arrays of
enums, and clang pessmistically assumes that accesses
to those enums could alias with other struct members
that have a different type.

According to

https://en.cppreference.com/w/c/language/enum

enums should be treated as 'int' types unless
explicitly specified (C23) or if 'int' would not be
large enough to hold all the enumerated values. In the
latter case the compiler is free to choose a suitable
integer that would hold all such values.

When compiling C code this patch generates TBAA
information for the enum by using an equivalent integer
of the size clang has already chosen for the enum. I
have ignored C++ for now because the rules are more
complex.

New test added here:

  clang/test/CodeGen/tbaa.c
---
 clang/lib/CodeGen/CodeGenTBAA.cpp |   5 +-
 clang/test/CodeGen/tbaa.c | 116 ++
 2 files changed, 120 insertions(+), 1 deletion(-)
 create mode 100644 clang/test/CodeGen/tbaa.c

diff --git a/clang/lib/CodeGen/CodeGenTBAA.cpp 
b/clang/lib/CodeGen/CodeGenTBAA.cpp
index 8705d3d65f1a573..f59d3d422d5209d 100644
--- a/clang/lib/CodeGen/CodeGenTBAA.cpp
+++ b/clang/lib/CodeGen/CodeGenTBAA.cpp
@@ -196,11 +196,14 @@ llvm::MDNode *CodeGenTBAA::getTypeInfoHelper(const Type 
*Ty) {
   // Enum types are distinct types. In C++ they have "underlying types",
   // however they aren't related for TBAA.
   if (const EnumType *ETy = dyn_cast(Ty)) {
+if (!Features.CPlusPlus)
+  return getTypeInfo(Context.getIntTypeForBitwidth(Size * 8, 0));
+
 // In C++ mode, types have linkage, so we can rely on the ODR and
 // on their mangled names, if they're external.
 // TODO: Is there a way to get a program-wide unique name for a
 // decl with local linkage or no linkage?
-if (!Features.CPlusPlus || !ETy->getDecl()->isExternallyVisible())
+if (!ETy->getDecl()->isExternallyVisible())
   return getChar();
 
 SmallString<256> OutName;
diff --git a/clang/test/CodeGen/tbaa.c b/clang/test/CodeGen/tbaa.c
new file mode 100644
index 000..0ab81f60a71941c
--- /dev/null
+++ b/clang/test/CodeGen/tbaa.c
@@ -0,0 +1,116 @@
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -O1 -no-struct-path-tbaa 
-disable-llvm-passes %s -emit-llvm -o - | FileCheck %s
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -O1 -disable-llvm-passes %s 
-emit-llvm -o - | FileCheck %s -check-prefixes=PATH
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -O0 -disable-llvm-passes %s 
-emit-llvm -o - | FileCheck %s -check-prefix=NO-TBAA
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -O1 -relaxed-aliasing 
-disable-llvm-passes %s -emit-llvm -o - | FileCheck %s -check-prefix=NO-TBAA
+// Test TBAA metadata generated by front-end.
+//
+// NO-TBAA-NOT: !tbaa
+
+typedef unsigned char uint8_t;
+typedef unsigned short uint16_t;
+typedef unsigned int uint32_t;
+typedef unsigned long long uint64_t;
+
+typedef enum {
+  RED_AUTO_32,
+  GREEN_AUTO_32,
+  BLUE_AUTO_32
+} EnumAuto32;
+
+typedef enum {
+  RED_AUTO_64,
+  GREEN_AUTO_64,
+  BLUE_AUTO_64 = 0x1ull
+} EnumAuto64;
+
+typedef enum : uint16_t {
+  RED_16,
+  GREEN_16,
+  BLUE_16
+} Enum16;
+
+typedef enum : uint8_t {
+  RED_8,
+  GREEN_8,
+  BLUE_8
+} Enum8;
+
+uint32_t g0(EnumAuto32 *E, uint32_t *val) {
+// CHECK-LABEL: define{{.*}} i32 @g0(
+// CHECK: store i32 5, ptr %{{.*}}, align 4, !tbaa [[TAG_i32:!.*]]
+// CHECK: store i32 0, ptr %{{.*}}, align 4, !tbaa [[TAG_i32]]
+// CHECK: load i32, ptr %{{.*}}, align 4, !tbaa [[TAG_i32]

[clang] [Clang] Emit TBAA info for enums in C (PR #73326)

2023-11-24 Thread David Sherwood via cfe-commits

https://github.com/david-arm edited 
https://github.com/llvm/llvm-project/pull/73326
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Emit TBAA info for enums in C (PR #73326)

2023-11-24 Thread David Sherwood via cfe-commits


@@ -196,11 +196,14 @@ llvm::MDNode *CodeGenTBAA::getTypeInfoHelper(const Type 
*Ty) {
   // Enum types are distinct types. In C++ they have "underlying types",
   // however they aren't related for TBAA.
   if (const EnumType *ETy = dyn_cast(Ty)) {
+if (!Features.CPlusPlus)
+  return getTypeInfo(Context.getIntTypeForBitwidth(Size * 8, 0));

david-arm wrote:

I am not sure if this is entirely correct so would appreciate some guidance 
here!

https://github.com/llvm/llvm-project/pull/73326
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Emit TBAA info for enums in C (PR #73326)

2023-11-27 Thread David Sherwood via cfe-commits

https://github.com/david-arm updated 
https://github.com/llvm/llvm-project/pull/73326

>From af76f6b6b3469fd0f5f24427c5a175c8d9d7c83a Mon Sep 17 00:00:00 2001
From: David Sherwood 
Date: Fri, 24 Nov 2023 13:20:23 +
Subject: [PATCH 1/2] [Clang] Emit TBAA info for enums in C

When emitting TBAA information for enums in C code we
currently just treat the data as an 'omnipotent char'.
However, with C strict aliasing this means we fail to
optimise certain cases. For example, in the SPEC2017
xz benchmark there are structs that contain arrays of
enums, and clang pessmistically assumes that accesses
to those enums could alias with other struct members
that have a different type.

According to

https://en.cppreference.com/w/c/language/enum

enums should be treated as 'int' types unless
explicitly specified (C23) or if 'int' would not be
large enough to hold all the enumerated values. In the
latter case the compiler is free to choose a suitable
integer that would hold all such values.

When compiling C code this patch generates TBAA
information for the enum by using an equivalent integer
of the size clang has already chosen for the enum. I
have ignored C++ for now because the rules are more
complex.

New test added here:

  clang/test/CodeGen/tbaa.c
---
 clang/lib/CodeGen/CodeGenTBAA.cpp |   5 +-
 clang/test/CodeGen/tbaa.c | 116 ++
 2 files changed, 120 insertions(+), 1 deletion(-)
 create mode 100644 clang/test/CodeGen/tbaa.c

diff --git a/clang/lib/CodeGen/CodeGenTBAA.cpp 
b/clang/lib/CodeGen/CodeGenTBAA.cpp
index 8705d3d65f1a573..f59d3d422d5209d 100644
--- a/clang/lib/CodeGen/CodeGenTBAA.cpp
+++ b/clang/lib/CodeGen/CodeGenTBAA.cpp
@@ -196,11 +196,14 @@ llvm::MDNode *CodeGenTBAA::getTypeInfoHelper(const Type 
*Ty) {
   // Enum types are distinct types. In C++ they have "underlying types",
   // however they aren't related for TBAA.
   if (const EnumType *ETy = dyn_cast(Ty)) {
+if (!Features.CPlusPlus)
+  return getTypeInfo(Context.getIntTypeForBitwidth(Size * 8, 0));
+
 // In C++ mode, types have linkage, so we can rely on the ODR and
 // on their mangled names, if they're external.
 // TODO: Is there a way to get a program-wide unique name for a
 // decl with local linkage or no linkage?
-if (!Features.CPlusPlus || !ETy->getDecl()->isExternallyVisible())
+if (!ETy->getDecl()->isExternallyVisible())
   return getChar();
 
 SmallString<256> OutName;
diff --git a/clang/test/CodeGen/tbaa.c b/clang/test/CodeGen/tbaa.c
new file mode 100644
index 000..0ab81f60a71941c
--- /dev/null
+++ b/clang/test/CodeGen/tbaa.c
@@ -0,0 +1,116 @@
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -O1 -no-struct-path-tbaa 
-disable-llvm-passes %s -emit-llvm -o - | FileCheck %s
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -O1 -disable-llvm-passes %s 
-emit-llvm -o - | FileCheck %s -check-prefixes=PATH
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -O0 -disable-llvm-passes %s 
-emit-llvm -o - | FileCheck %s -check-prefix=NO-TBAA
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -O1 -relaxed-aliasing 
-disable-llvm-passes %s -emit-llvm -o - | FileCheck %s -check-prefix=NO-TBAA
+// Test TBAA metadata generated by front-end.
+//
+// NO-TBAA-NOT: !tbaa
+
+typedef unsigned char uint8_t;
+typedef unsigned short uint16_t;
+typedef unsigned int uint32_t;
+typedef unsigned long long uint64_t;
+
+typedef enum {
+  RED_AUTO_32,
+  GREEN_AUTO_32,
+  BLUE_AUTO_32
+} EnumAuto32;
+
+typedef enum {
+  RED_AUTO_64,
+  GREEN_AUTO_64,
+  BLUE_AUTO_64 = 0x1ull
+} EnumAuto64;
+
+typedef enum : uint16_t {
+  RED_16,
+  GREEN_16,
+  BLUE_16
+} Enum16;
+
+typedef enum : uint8_t {
+  RED_8,
+  GREEN_8,
+  BLUE_8
+} Enum8;
+
+uint32_t g0(EnumAuto32 *E, uint32_t *val) {
+// CHECK-LABEL: define{{.*}} i32 @g0(
+// CHECK: store i32 5, ptr %{{.*}}, align 4, !tbaa [[TAG_i32:!.*]]
+// CHECK: store i32 0, ptr %{{.*}}, align 4, !tbaa [[TAG_i32]]
+// CHECK: load i32, ptr %{{.*}}, align 4, !tbaa [[TAG_i32]]
+// PATH-LABEL: define{{.*}} i32 @g0(
+// PATH: store i32 5, ptr %{{.*}}, align 4, !tbaa [[TAG_i32:!.*]]
+// PATH: store i32 0, ptr %{{.*}}, align 4, !tbaa [[TAG_i32]]
+// PATH: load i32, ptr %{{.*}}, align 4, !tbaa [[TAG_i32]]
+  *val = 5;
+  *E = RED_AUTO_32;
+  return *val;
+}
+
+uint64_t g1(EnumAuto64 *E, uint64_t *val) {
+// CHECK-LABEL: define{{.*}} i64 @g1(
+// CHECK: store i64 5, ptr %{{.*}}, align 8, !tbaa [[TAG_i64:!.*]]
+// CHECK: store i64 0, ptr %{{.*}}, align 8, !tbaa [[TAG_long:!.*]]
+// CHECK: load i64, ptr %{{.*}}, align 8, !tbaa [[TAG_i64]]
+// PATH-LABEL: define{{.*}} i64 @g1(
+// PATH: store i64 5, ptr %{{.*}}, align 8, !tbaa [[TAG_i64:!.*]]
+// PATH: store i64 0, ptr %{{.*}}, align 8, !tbaa [[TAG_long:!.*]]
+// PATH: load i64, ptr %{{.*}}, align 8, !tbaa [[TAG_i64]]
+  *val = 5;
+  *E = RED_AUTO_64;
+  return *val;
+}
+
+uint16_t g2(Enum16 *E, uint16_t *val) {
+// CHECK-LABEL: define{{.*}} i16 @g2(
+// CHECK: store i16 5, ptr %{{.*}}, align 2, !tbaa [[TA

[clang] [Clang] Emit TBAA info for enums in C (PR #73326)

2023-11-27 Thread David Sherwood via cfe-commits


@@ -196,11 +196,14 @@ llvm::MDNode *CodeGenTBAA::getTypeInfoHelper(const Type 
*Ty) {
   // Enum types are distinct types. In C++ they have "underlying types",
   // however they aren't related for TBAA.
   if (const EnumType *ETy = dyn_cast(Ty)) {
+if (!Features.CPlusPlus)
+  return getTypeInfo(Context.getIntTypeForBitwidth(Size * 8, 0));

david-arm wrote:

Good suggestion - thanks!

https://github.com/llvm/llvm-project/pull/73326
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [AArch64][SME2] Add multi-vector SEL (x2, x4) ACLE builtins & intrinsics (PR #73188)

2023-11-28 Thread David Sherwood via cfe-commits


@@ -0,0 +1,384 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// REQUIRES: aarch64-registered-target
+
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve 
-target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | 
opt -S -passes=mem2reg,instcombine,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve 
-target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x 
c++ %s | opt -S -passes=mem2reg,instcombine,tailcallelim | FileCheck %s 
-check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu 
-target-feature +sve -target-feature +sme2 -S -disable-O0-optnone -Werror -Wall 
-emit-llvm -o - %s | opt -S -passes=mem2reg,instcombine,tailcallelim | 
FileCheck %s
+// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu 
-target-feature +sve -target-feature +sme2 -S -disable-O0-optnone -Werror -Wall 
-emit-llvm -o - -x c++ %s | opt -S -passes=mem2reg,instcombine,tailcallelim | 
FileCheck %s -check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve 
-target-feature +sme2 -target-feature -S -disable-O0-optnone -Werror -Wall -o 
/dev/null %s
+#include 
+
+#ifdef SVE_OVERLOADED_FORMS
+// A simple used,unused... macro, long enough to represent any SVE builtin.
+#define SVE_ACLE_FUNC(A1,A2_UNUSED) A1
+#else
+#define SVE_ACLE_FUNC(A1,A2) A1##A2
+#endif
+
+// 8-bit ZIPs

david-arm wrote:

I think this comment should say "8-bit SELs" and similarly for all the other 
comments in both the selx2 and selx4 files.

https://github.com/llvm/llvm-project/pull/73188
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [AArch64][SME2] Add multi-vector SEL (x2, x4) ACLE builtins & intrinsics (PR #73188)

2023-11-28 Thread David Sherwood via cfe-commits




david-arm wrote:

Should the file be renamed to acle_sme2_vector_selx4? This would make it 
consistent with the existing acle_sme2_vector_add.c file, which also has 
SVE-like instructions that only operate on SVE vectors.

https://github.com/llvm/llvm-project/pull/73188
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [AArch64][SME2] Add multi-vector SEL (x2, x4) ACLE builtins & intrinsics (PR #73188)

2023-11-28 Thread David Sherwood via cfe-commits




david-arm wrote:

Should the file be renamed to acle_sme2_vector_selx2? This would make it 
consistent with the existing acle_sme2_vector_add.c file, which also has 
SVE-like instructions that only operate on SVE vectors.

https://github.com/llvm/llvm-project/pull/73188
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [AArch64][SME2] Add multi-vector SEL (x2, x4) ACLE builtins & intrinsics (PR #73188)

2023-11-29 Thread David Sherwood via cfe-commits

https://github.com/david-arm approved this pull request.

LGTM! Thanks for the changes. :)

https://github.com/llvm/llvm-project/pull/73188
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][SME2] Add multi-vector add/sub builtins (PR #69725)

2023-10-25 Thread David Sherwood via cfe-commits

https://github.com/david-arm edited 
https://github.com/llvm/llvm-project/pull/69725
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][SME2] Add multi-vector add/sub builtins (PR #69725)

2023-10-25 Thread David Sherwood via cfe-commits

https://github.com/david-arm commented:

This looks a lot better now @kmclaughlin-arm - thanks for the changes! I just 
have a couple of comments about the tests that I missed previously...

https://github.com/llvm/llvm-project/pull/69725
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][SME2] Add multi-vector add/sub builtins (PR #69725)

2023-10-25 Thread David Sherwood via cfe-commits


@@ -0,0 +1,418 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+
+// REQUIRES: aarch64-registered-target
+
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 
-target-feature +sme-i16i64 -target-feature +sme-f64f64 -target-feature +sve -S 
-disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p 
mem2reg,instcombine,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 
-target-feature +sme-i16i64 -target-feature +sme-f64f64 -target-feature +sve -S 
-disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p 
mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu 
-target-feature +sme2 -target-feature +sme-i16i64 -target-feature +sme-f64f64 
-target-feature +sve -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | 
opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu 
-target-feature +sme2 -target-feature +sme-i16i64 -target-feature +sme-f64f64 
-target-feature +sve -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x 
c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s 
-check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 
-target-feature +sme-i16i64 -target-feature +sme-f64f64 -target-feature +sve -S 
-disable-O0-optnone -Werror -Wall -o /dev/null %s
+
+#include 
+
+#ifdef SVE_OVERLOADED_FORMS
+// A simple used,unused... macro, long enough to represent any SVE builtin.
+#define SVE_ACLE_FUNC(A1,A2_UNUSED,A3,A4_UNUSED,A5) A1##A3##A5
+#else
+#define SVE_ACLE_FUNC(A1,A2,A3,A4,A5) A1##A2##A3##A4##A5
+#endif
+
+//
+// Single-Multi
+//
+
+// x2
+// CHECK-LABEL: @test_svsub_write_single2_u32(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[ADD:%.*]] = add i32 [[SLICE_BASE:%.*]], 7
+// CHECK-NEXT:[[TMP0:%.*]] = tail call  
@llvm.vector.extract.nxv4i32.nxv8i32( [[ZN:%.*]], i64 0)
+// CHECK-NEXT:[[TMP1:%.*]] = tail call  
@llvm.vector.extract.nxv4i32.nxv8i32( [[ZN]], i64 4)
+// CHECK-NEXT:tail call void 
@llvm.aarch64.sme.sub.write.single.za.vg1x2.nxv4i32(i32 [[ADD]],  [[TMP0]],  [[TMP1]],  [[ZM:%.*]])
+// CHECK-NEXT:ret void
+//
+// CPP-CHECK-LABEL: 
@_Z28test_svsub_write_single2_u32j12svuint32x2_tu12__SVUint32_t(
+// CPP-CHECK-NEXT:  entry:
+// CPP-CHECK-NEXT:[[ADD:%.*]] = add i32 [[SLICE_BASE:%.*]], 7
+// CPP-CHECK-NEXT:[[TMP0:%.*]] = tail call  
@llvm.vector.extract.nxv4i32.nxv8i32( [[ZN:%.*]], i64 0)
+// CPP-CHECK-NEXT:[[TMP1:%.*]] = tail call  
@llvm.vector.extract.nxv4i32.nxv8i32( [[ZN]], i64 4)
+// CPP-CHECK-NEXT:tail call void 
@llvm.aarch64.sme.sub.write.single.za.vg1x2.nxv4i32(i32 [[ADD]],  [[TMP0]],  [[TMP1]],  [[ZM:%.*]])
+// CPP-CHECK-NEXT:ret void
+//
+void test_svsub_write_single2_u32(uint32_t slice_base, svuint32x2_t zn, 
svuint32_t zm) {
+  SVE_ACLE_FUNC(svsub_write,_single,_za32,_u32,_vg1x2)(slice_base + 7, zn, zm);
+}
+
+// CHECK-LABEL: @test_svsub_write_single2_u64(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[ADD:%.*]] = add i32 [[SLICE_BASE:%.*]], 7
+// CHECK-NEXT:[[TMP0:%.*]] = tail call  
@llvm.vector.extract.nxv2i64.nxv4i64( [[ZN:%.*]], i64 0)
+// CHECK-NEXT:[[TMP1:%.*]] = tail call  
@llvm.vector.extract.nxv2i64.nxv4i64( [[ZN]], i64 2)
+// CHECK-NEXT:tail call void 
@llvm.aarch64.sme.sub.write.single.za.vg1x2.nxv2i64(i32 [[ADD]],  [[TMP0]],  [[TMP1]],  [[ZM:%.*]])
+// CHECK-NEXT:ret void
+//
+// CPP-CHECK-LABEL: 
@_Z28test_svsub_write_single2_u64j12svuint64x2_tu12__SVUint64_t(
+// CPP-CHECK-NEXT:  entry:
+// CPP-CHECK-NEXT:[[ADD:%.*]] = add i32 [[SLICE_BASE:%.*]], 7
+// CPP-CHECK-NEXT:[[TMP0:%.*]] = tail call  
@llvm.vector.extract.nxv2i64.nxv4i64( [[ZN:%.*]], i64 0)
+// CPP-CHECK-NEXT:[[TMP1:%.*]] = tail call  
@llvm.vector.extract.nxv2i64.nxv4i64( [[ZN]], i64 2)
+// CPP-CHECK-NEXT:tail call void 
@llvm.aarch64.sme.sub.write.single.za.vg1x2.nxv2i64(i32 [[ADD]],  [[TMP0]],  [[TMP1]],  [[ZM:%.*]])
+// CPP-CHECK-NEXT:ret void
+//
+void test_svsub_write_single2_u64(uint32_t slice_base, svuint64x2_t zn, 
svuint64_t zm) {
+  SVE_ACLE_FUNC(svsub_write,_single,_za64,_u64,_vg1x2)(slice_base + 7, zn, zm);
+}
+
+// x4
+
+// CHECK-LABEL: @test_svsub_write_single4_u32(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[ADD:%.*]] = add i32 [[SLICE_BASE:%.*]], 7
+// CHECK-NEXT:[[TMP0:%.*]] = tail call  
@llvm.vector.extract.nxv4i32.nxv16i32( [[ZN:%.*]], i64 0)
+// CHECK-NEXT:[[TMP1:%.*]] = tail call  
@llvm.vector.extract.nxv4i32.nxv16i32( [[ZN]], i64 4)
+// CHECK-NEXT:[[TMP2:%.*]] = tail call  
@llvm.vector.extract.nxv4i32.nxv16i32( [[ZN]], i64 8)
+// CHECK-NEXT:[[TMP3:%.*]] = tail call  
@llvm.vector.extract.nxv4i32.nxv16i32( [[ZN]], i64 12)
+// CHECK-NEXT:tail call void 
@llvm.aarch64.sme.sub.write.single.za.vg1x4.nxv4i32(i32 [[ADD]],  [[TMP0]], 

[clang] [Clang][SME2] Add multi-vector add/sub builtins (PR #69725)

2023-10-25 Thread David Sherwood via cfe-commits


@@ -0,0 +1,1226 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+
+// REQUIRES: aarch64-registered-target
+
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 
-target-feature +sme-i16i64 -target-feature +sme-f64f64 -target-feature +sve -S 
-disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p 
mem2reg,instcombine,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 
-target-feature +sme-i16i64 -target-feature +sme-f64f64 -target-feature +sve -S 
-disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p 
mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu 
-target-feature +sme2 -target-feature +sme-i16i64 -target-feature +sme-f64f64 
-target-feature +sve -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | 
opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu 
-target-feature +sme2 -target-feature +sme-i16i64 -target-feature +sme-f64f64 
-target-feature +sve -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x 
c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s 
-check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 
-target-feature +sme-i16i64 -target-feature +sme-f64f64 -target-feature +sve -S 
-disable-O0-optnone -Werror -Wall -o /dev/null %s
+
+#include 
+
+#ifdef SVE_OVERLOADED_FORMS
+// A simple used,unused... macro, long enough to represent any SVE builtin.
+#define SVE_ACLE_FUNC(A1,A2_UNUSED,A3,A4_UNUSED,A5) A1##A3##A5
+#else
+#define SVE_ACLE_FUNC(A1,A2,A3,A4,A5) A1##A2##A3##A4##A5
+#endif
+
+//
+// Single-Multi
+//
+
+// x2
+// CHECK-LABEL: @test_svadd_write_single2_s32(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[ADD:%.*]] = add i32 [[SLICE_BASE:%.*]], 7
+// CHECK-NEXT:[[TMP0:%.*]] = tail call  
@llvm.vector.extract.nxv4i32.nxv8i32( [[ZN:%.*]], i64 0)
+// CHECK-NEXT:[[TMP1:%.*]] = tail call  
@llvm.vector.extract.nxv4i32.nxv8i32( [[ZN]], i64 4)
+// CHECK-NEXT:tail call void 
@llvm.aarch64.sme.add.write.single.za.vg1x2.nxv4i32(i32 [[ADD]],  [[TMP0]],  [[TMP1]],  [[ZM:%.*]])
+// CHECK-NEXT:ret void
+//
+// CPP-CHECK-LABEL: 
@_Z28test_svadd_write_single2_s32j11svint32x2_tu11__SVInt32_t(
+// CPP-CHECK-NEXT:  entry:
+// CPP-CHECK-NEXT:[[ADD:%.*]] = add i32 [[SLICE_BASE:%.*]], 7
+// CPP-CHECK-NEXT:[[TMP0:%.*]] = tail call  
@llvm.vector.extract.nxv4i32.nxv8i32( [[ZN:%.*]], i64 0)
+// CPP-CHECK-NEXT:[[TMP1:%.*]] = tail call  
@llvm.vector.extract.nxv4i32.nxv8i32( [[ZN]], i64 4)
+// CPP-CHECK-NEXT:tail call void 
@llvm.aarch64.sme.add.write.single.za.vg1x2.nxv4i32(i32 [[ADD]],  [[TMP0]],  [[TMP1]],  [[ZM:%.*]])
+// CPP-CHECK-NEXT:ret void
+//
+void test_svadd_write_single2_s32(uint32_t slice_base, svint32x2_t zn, 
svint32_t zm) {
+  SVE_ACLE_FUNC(svadd_write,_single,_za32,_s32,_vg1x2)(slice_base + 7, zn, zm);
+}
+
+// CHECK-LABEL: @test_svadd_write_single2_u32(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[ADD:%.*]] = add i32 [[SLICE_BASE:%.*]], 7
+// CHECK-NEXT:[[TMP0:%.*]] = tail call  
@llvm.vector.extract.nxv4i32.nxv8i32( [[ZN:%.*]], i64 0)
+// CHECK-NEXT:[[TMP1:%.*]] = tail call  
@llvm.vector.extract.nxv4i32.nxv8i32( [[ZN]], i64 4)
+// CHECK-NEXT:tail call void 
@llvm.aarch64.sme.add.write.single.za.vg1x2.nxv4i32(i32 [[ADD]],  [[TMP0]],  [[TMP1]],  [[ZM:%.*]])
+// CHECK-NEXT:ret void
+//
+// CPP-CHECK-LABEL: 
@_Z28test_svadd_write_single2_u32j12svuint32x2_tu12__SVUint32_t(
+// CPP-CHECK-NEXT:  entry:
+// CPP-CHECK-NEXT:[[ADD:%.*]] = add i32 [[SLICE_BASE:%.*]], 7
+// CPP-CHECK-NEXT:[[TMP0:%.*]] = tail call  
@llvm.vector.extract.nxv4i32.nxv8i32( [[ZN:%.*]], i64 0)
+// CPP-CHECK-NEXT:[[TMP1:%.*]] = tail call  
@llvm.vector.extract.nxv4i32.nxv8i32( [[ZN]], i64 4)
+// CPP-CHECK-NEXT:tail call void 
@llvm.aarch64.sme.add.write.single.za.vg1x2.nxv4i32(i32 [[ADD]],  [[TMP0]],  [[TMP1]],  [[ZM:%.*]])
+// CPP-CHECK-NEXT:ret void
+//
+void test_svadd_write_single2_u32(uint32_t slice_base, svuint32x2_t zn, 
svuint32_t zm) {
+  SVE_ACLE_FUNC(svadd_write,_single,_za32,_u32,_vg1x2)(slice_base + 7, zn, zm);
+}
+
+// CHECK-LABEL: @test_svadd_write_single2_s64(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[ADD:%.*]] = add i32 [[SLICE_BASE:%.*]], 7
+// CHECK-NEXT:[[TMP0:%.*]] = tail call  
@llvm.vector.extract.nxv2i64.nxv4i64( [[ZN:%.*]], i64 0)
+// CHECK-NEXT:[[TMP1:%.*]] = tail call  
@llvm.vector.extract.nxv2i64.nxv4i64( [[ZN]], i64 2)
+// CHECK-NEXT:tail call void 
@llvm.aarch64.sme.add.write.single.za.vg1x2.nxv2i64(i32 [[ADD]],  [[TMP0]],  [[TMP1]],  [[ZM:%.*]])
+// CHECK-NEXT:ret void
+//
+// CPP-CHECK-LABEL: 
@_Z28test_svadd_write_single2_s64j11svint64x2_tu11__SVInt64_t(
+// CPP-CHECK-NEXT:  entry:
+// CPP-CHECK-NEXT:[[ADD:%.*]] = add i32 [

[clang] [Clang][SME2] Add multi-vector add/sub builtins (PR #69725)

2023-10-27 Thread David Sherwood via cfe-commits

https://github.com/david-arm approved this pull request.

LGTM! Eccelente! Thanks for the changes @kmclaughlin-arm.

https://github.com/llvm/llvm-project/pull/69725
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [SVE][InstCombine] Delete redundante sel instructions with ptrue (PR #68463)

2023-10-10 Thread David Sherwood via cfe-commits


@@ -800,6 +800,13 @@ instCombineConvertFromSVBool(InstCombiner &IC, 
IntrinsicInst &II) {
 
 static std::optional instCombineSVESel(InstCombiner &IC,
   IntrinsicInst &II) {
+  // svsel(ptrue, x, y) => x
+  auto *OpPredicate = II.getOperand(0);

david-arm wrote:

This looks like a valid optimisation, but it also suggests that the ACLE code 
written in C/C++ would benefit from being rewritten in a way that avoids this.

https://github.com/llvm/llvm-project/pull/68463
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [SVE][InstCombine] Delete redundante sel instructions with ptrue (PR #68463)

2023-10-10 Thread David Sherwood via cfe-commits


@@ -63,6 +63,20 @@ svint32_t test_svsel_s32(svbool_t pg, svint32_t op1, 
svint32_t op2)
   return SVE_ACLE_FUNC(svsel,_s32,,)(pg, op1, op2);
 }
 
+// CHECK-LABEL: @test_svsel_s32_ptrue(

david-arm wrote:

I'm not sure if this test really adds any more value, since the other test in 
Transforms/InstCombine/AArch64/sve-intrinsic-sel.ll is also testing the 
InstCombine.

https://github.com/llvm/llvm-project/pull/68463
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CXXNameMangler] Correct the mangling of SVE ACLE types within function names. (PR #69460)

2023-10-19 Thread David Sherwood via cfe-commits

https://github.com/david-arm approved this pull request.

LGTM! An outstanding work of art @paulwalker-arm!

https://github.com/llvm/llvm-project/pull/69460
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][SME2] Add multi-vector add/sub builtins (PR #69725)

2023-10-20 Thread David Sherwood via cfe-commits

https://github.com/david-arm edited 
https://github.com/llvm/llvm-project/pull/69725
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][SME2] Add multi-vector add/sub builtins (PR #69725)

2023-10-20 Thread David Sherwood via cfe-commits


@@ -9893,24 +9888,37 @@ Value *CodeGenFunction::FormSVEBuiltinResult(Value 
*Call) {
   return Call;
 }
 
-Value *CodeGenFunction::EmitAArch64SVEBuiltinExpr(unsigned BuiltinID,
-  const CallExpr *E) {
+void CodeGenFunction::GetAArch64SMEProcessedOperands(

david-arm wrote:

I wonder if actually this is better named as GetAArch64SVEProcessedOperands 
because if we have to choose a name that's common to both the SME and SVE 
builtins, choosing SVE might make more sense. That's because we're specifically 
dealing with scalable vectors in general here and not something that's 
intrinsically linked to SME.

https://github.com/llvm/llvm-project/pull/69725
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][SME2] Add multi-vector add/sub builtins (PR #69725)

2023-10-20 Thread David Sherwood via cfe-commits


@@ -1016,29 +1021,24 @@ std::string Intrinsic::mangleName(ClassKind LocalCK) 
const {
  getMergeSuffix();
 }
 
-void Intrinsic::emitIntrinsic(raw_ostream &OS, SVEEmitter &Emitter) const {
+void Intrinsic::emitIntrinsic(raw_ostream &OS, ACLEKind Kind) const {
   bool IsOverloaded = getClassKind() == ClassG && getProto().size() > 1;
 
   std::string FullName = mangleName(ClassS);
   std::string ProtoName = mangleName(getClassKind());
   std::string SMEAttrs = "";
 
-  if (Flags & Emitter.getEnumValueForFlag("IsStreaming"))
-SMEAttrs += ", arm_streaming";
-  if (Flags & Emitter.getEnumValueForFlag("IsStreamingCompatible"))
-SMEAttrs += ", arm_streaming_compatible";
-  if (Flags & Emitter.getEnumValueForFlag("IsSharedZA"))
-SMEAttrs += ", arm_shared_za";
-  if (Flags & Emitter.getEnumValueForFlag("IsPreservesZA"))
-SMEAttrs += ", arm_preserves_za";
-
   OS << (IsOverloaded ? "__aio " : "__ai ")
- << "__attribute__((__clang_arm_builtin_alias("
- << (SMEAttrs.empty() ? "__builtin_sve_" : "__builtin_sme_")
- << FullName << ")";
-  if (!SMEAttrs.empty())
-OS << SMEAttrs;

david-arm wrote:

It looks like we're no longer printing out the attributes for the builtin - is 
this because the attributes are dealt with explicitly elsewhere in clang and so 
they are no longer needed?

https://github.com/llvm/llvm-project/pull/69725
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][SME2] Add multi-vector add/sub builtins (PR #69725)

2023-10-20 Thread David Sherwood via cfe-commits

https://github.com/david-arm commented:

I've not done an exhaustive review, but thought I'd leave the comments I have 
so far!

https://github.com/llvm/llvm-project/pull/69725
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][SME2] Add multi-vector add/sub builtins (PR #69725)

2023-10-20 Thread David Sherwood via cfe-commits


@@ -10272,29 +10291,13 @@ Value 
*CodeGenFunction::EmitAArch64SMEBuiltinExpr(unsigned BuiltinID,
   getContext().GetBuiltinType(BuiltinID, Error, &ICEArguments);

david-arm wrote:

Do we still need this code given we're now checking the ICE arguments in 
GetAArch64SMEProcessedOperands?

https://github.com/llvm/llvm-project/pull/69725
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][SME2] Add multi-vector add/sub builtins (PR #69725)

2023-10-20 Thread David Sherwood via cfe-commits


@@ -9893,24 +9888,37 @@ Value *CodeGenFunction::FormSVEBuiltinResult(Value 
*Call) {
   return Call;
 }
 
-Value *CodeGenFunction::EmitAArch64SVEBuiltinExpr(unsigned BuiltinID,
-  const CallExpr *E) {
+void CodeGenFunction::GetAArch64SMEProcessedOperands(
+unsigned BuiltinID, const CallExpr *E, SmallVectorImpl &Ops,
+SVETypeFlags TypeFlags) {
   // Find out if any arguments are required to be integer constant expressions.
   unsigned ICEArguments = 0;
   ASTContext::GetBuiltinTypeError Error;
   getContext().GetBuiltinType(BuiltinID, Error, &ICEArguments);
   assert(Error == ASTContext::GE_None && "Should not codegen an error");
 
-  llvm::Type *Ty = ConvertType(E->getType());
-  if (BuiltinID >= SVE::BI__builtin_sve_reinterpret_s8_s8 &&
-  BuiltinID <= SVE::BI__builtin_sve_reinterpret_f64_f64) {
-Value *Val = EmitScalarExpr(E->getArg(0));
-return EmitSVEReinterpret(Val, Ty);
-  }
+  bool IsTupleGetOrSet = TypeFlags.isTupleSet() || TypeFlags.isTupleGet();
 
-  llvm::SmallVector Ops;
   for (unsigned i = 0, e = E->getNumArgs(); i != e; i++) {
-if ((ICEArguments & (1 << i)) == 0)
+if (!IsTupleGetOrSet && (ICEArguments & (1 << i)) == 0) {

david-arm wrote:

Perhaps you can create a temp variable and reuse it so it's a bit clearer, i.e.

```
  bool IsICE = ICEArguments & (1 << i);
  if (!IsTupleGetOrSet && !IsICE) {
  ...
  } else if (!IsICE) {
  ...
```

https://github.com/llvm/llvm-project/pull/69725
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][SME2] Add multi-vector add/sub builtins (PR #69725)

2023-10-20 Thread David Sherwood via cfe-commits


@@ -9893,24 +9888,37 @@ Value *CodeGenFunction::FormSVEBuiltinResult(Value 
*Call) {
   return Call;
 }
 
-Value *CodeGenFunction::EmitAArch64SVEBuiltinExpr(unsigned BuiltinID,
-  const CallExpr *E) {
+void CodeGenFunction::GetAArch64SMEProcessedOperands(
+unsigned BuiltinID, const CallExpr *E, SmallVectorImpl &Ops,
+SVETypeFlags TypeFlags) {
   // Find out if any arguments are required to be integer constant expressions.
   unsigned ICEArguments = 0;
   ASTContext::GetBuiltinTypeError Error;
   getContext().GetBuiltinType(BuiltinID, Error, &ICEArguments);
   assert(Error == ASTContext::GE_None && "Should not codegen an error");
 
-  llvm::Type *Ty = ConvertType(E->getType());
-  if (BuiltinID >= SVE::BI__builtin_sve_reinterpret_s8_s8 &&
-  BuiltinID <= SVE::BI__builtin_sve_reinterpret_f64_f64) {
-Value *Val = EmitScalarExpr(E->getArg(0));
-return EmitSVEReinterpret(Val, Ty);
-  }
+  bool IsTupleGetOrSet = TypeFlags.isTupleSet() || TypeFlags.isTupleGet();
 
-  llvm::SmallVector Ops;
   for (unsigned i = 0, e = E->getNumArgs(); i != e; i++) {
-if ((ICEArguments & (1 << i)) == 0)

david-arm wrote:

Might be worth adding a comment explaining why we explicitly ignore tuple 
get/set functions?

https://github.com/llvm/llvm-project/pull/69725
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 4cf11d8 - [Clang][SVE] Permit specific predicate-as-counter registers in inline assembly

2023-07-25 Thread David Sherwood via cfe-commits

Author: David Sherwood
Date: 2023-07-25T08:55:45Z
New Revision: 4cf11d8a65dfded59761ec52804a86277b9c0036

URL: 
https://github.com/llvm/llvm-project/commit/4cf11d8a65dfded59761ec52804a86277b9c0036
DIFF: 
https://github.com/llvm/llvm-project/commit/4cf11d8a65dfded59761ec52804a86277b9c0036.diff

LOG: [Clang][SVE] Permit specific predicate-as-counter registers in inline 
assembly

This patch adds the predicate-as-counter registers pn0-pn15 to the
list of supported registers used when writing inline assembly.

Tests added to

  clang/test/CodeGen/aarch64-sve-inline-asm.c

Differential Revision: https://reviews.llvm.org/D156115

Added: 


Modified: 
clang/lib/Basic/Targets/AArch64.cpp
clang/test/CodeGen/aarch64-sve-inline-asm.c

Removed: 




diff  --git a/clang/lib/Basic/Targets/AArch64.cpp 
b/clang/lib/Basic/Targets/AArch64.cpp
index ed0246d6faee16..7c4cc5fb33f886 100644
--- a/clang/lib/Basic/Targets/AArch64.cpp
+++ b/clang/lib/Basic/Targets/AArch64.cpp
@@ -1164,7 +1164,11 @@ const char *const AArch64TargetInfo::GCCRegNames[] = {
 
 // SVE predicate registers
 "p0",  "p1",  "p2",  "p3",  "p4",  "p5",  "p6",  "p7",  "p8",  "p9",  
"p10",
-"p11", "p12", "p13", "p14", "p15"
+"p11", "p12", "p13", "p14", "p15",
+
+// SVE predicate-as-counter registers
+"pn0",  "pn1",  "pn2",  "pn3",  "pn4",  "pn5",  "pn6",  "pn7",  "pn8",
+"pn9",  "pn10", "pn11", "pn12", "pn13", "pn14", "pn15"
 };
 
 ArrayRef AArch64TargetInfo::getGCCRegNames() const {

diff  --git a/clang/test/CodeGen/aarch64-sve-inline-asm.c 
b/clang/test/CodeGen/aarch64-sve-inline-asm.c
index 8f26680e08f4c5..428aa32e7f98d3 100644
--- a/clang/test/CodeGen/aarch64-sve-inline-asm.c
+++ b/clang/test/CodeGen/aarch64-sve-inline-asm.c
@@ -1,4 +1,8 @@
-// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -emit-llvm -o - %s | 
FileCheck %s -check-prefix=CHECK
+// REQUIRES: aarch64-registered-target
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve 
-target-feature +sve2p1 \
+// RUN:   -emit-llvm -o - %s | FileCheck %s -check-prefix=CHECK
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve 
-target-feature +sve2p1 \
+// RUN:   -S -o /dev/null
 
 void test_sve_asm(void) {
   asm volatile(
@@ -9,5 +13,16 @@ void test_sve_asm(void) {
   :
   :
   : "z0", "z31", "p0", "p15");
+  // CHECK-LABEL: @test_sve_asm
   // CHECK: "~{z0},~{z31},~{p0},~{p15}"
 }
+
+void test_sve2p1_asm(void) {
+  asm("pfalse pn0.b\n"
+  "ptrue pn8.d\n"
+  "ptrue pn15.b\n"
+  "pext p3.b, pn8[1]\n"
+  ::: "pn0", "pn8", "pn15", "p3");
+  // CHECK-LABEL: @test_sve2p1_asm
+  // CHECK: "~{pn0},~{pn8},~{pn15},~{p3}"
+}



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 2a48b69 - [IR] In ConstantFoldShuffleVectorInstruction use zeroinitializer for splats of 0

2021-11-10 Thread David Sherwood via cfe-commits

Author: David Sherwood
Date: 2021-11-10T09:42:58Z
New Revision: 2a48b6993a973e0ab2331e8c11dbd6e6100e2cfe

URL: 
https://github.com/llvm/llvm-project/commit/2a48b6993a973e0ab2331e8c11dbd6e6100e2cfe
DIFF: 
https://github.com/llvm/llvm-project/commit/2a48b6993a973e0ab2331e8c11dbd6e6100e2cfe.diff

LOG: [IR] In ConstantFoldShuffleVectorInstruction use zeroinitializer for 
splats of 0

When creating a splat of 0 for scalable vectors we tend to create them
with using a combination of shufflevector and insertelement, i.e.

shufflevector ( insertelement ( poison, i32 
0, i32 0),
poison,  zeroinitializer)

However, for the case of a zero splat we can actually just replace the
above with zeroinitializer instead. This makes the IR a lot simpler and
easier to read. I have changed ConstantFoldShuffleVectorInstruction to
use zeroinitializer when creating a splat of integer 0 or FP +0.0 values.

Differential Revision: https://reviews.llvm.org/D113394

Added: 


Modified: 
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_dupq.c
llvm/lib/IR/ConstantFold.cpp
llvm/test/Bitcode/vscale-round-trip.ll
llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll
llvm/test/Transforms/LoopVectorize/AArch64/sve-basic-vec.ll
llvm/test/Transforms/LoopVectorize/AArch64/sve-cond-inv-loads.ll
llvm/test/Transforms/LoopVectorize/AArch64/sve-inv-store.ll
llvm/test/Transforms/LoopVectorize/AArch64/sve-select-cmp.ll
llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-gep.ll
llvm/test/Transforms/LoopVectorize/scalable-inductions.ll
llvm/test/Transforms/LoopVectorize/scalable-reduction-inloop.ll

Removed: 




diff  --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_dupq.c 
b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_dupq.c
index 3c13080e14f70..1cd9ef1f1a277 100644
--- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_dupq.c
+++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_dupq.c
@@ -568,7 +568,7 @@ svfloat64_t test_svdupq_n_f64(float64_t x0, float64_t x1)
 // CHECK-NEXT:[[TMP16:%.*]] = call  
@llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
 // CHECK-NEXT:[[TMP17:%.*]] = call  
@llvm.experimental.vector.insert.nxv16i8.v16i8( undef, <16 x 
i8> [[TMP15]], i64 0)
 // CHECK-NEXT:[[TMP18:%.*]] = call  
@llvm.aarch64.sve.dupq.lane.nxv16i8( [[TMP17]], i64 0)
-// CHECK-NEXT:[[TMP19:%.*]] = call  
@llvm.aarch64.sve.cmpne.wide.nxv16i8( [[TMP16]],  [[TMP18]],  shufflevector ( 
insertelement ( poison, i64 0, i32 0),  
poison,  zeroinitializer))
+// CHECK-NEXT:[[TMP19:%.*]] = call  
@llvm.aarch64.sve.cmpne.wide.nxv16i8( [[TMP16]],  [[TMP18]],  zeroinitializer)
 // CHECK-NEXT:ret  [[TMP19]]
 //
 // CPP-CHECK-LABEL: @_Z16test_svdupq_n_b8(
@@ -608,7 +608,7 @@ svfloat64_t test_svdupq_n_f64(float64_t x0, float64_t x1)
 // CPP-CHECK-NEXT:[[TMP16:%.*]] = call  
@llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
 // CPP-CHECK-NEXT:[[TMP17:%.*]] = call  
@llvm.experimental.vector.insert.nxv16i8.v16i8( undef, <16 x 
i8> [[TMP15]], i64 0)
 // CPP-CHECK-NEXT:[[TMP18:%.*]] = call  
@llvm.aarch64.sve.dupq.lane.nxv16i8( [[TMP17]], i64 0)
-// CPP-CHECK-NEXT:[[TMP19:%.*]] = call  
@llvm.aarch64.sve.cmpne.wide.nxv16i8( [[TMP16]],  [[TMP18]],  shufflevector ( 
insertelement ( poison, i64 0, i32 0),  
poison,  zeroinitializer))
+// CPP-CHECK-NEXT:[[TMP19:%.*]] = call  
@llvm.aarch64.sve.cmpne.wide.nxv16i8( [[TMP16]],  [[TMP18]],  zeroinitializer)
 // CPP-CHECK-NEXT:ret  [[TMP19]]
 //
 svbool_t test_svdupq_n_b8(bool x0, bool x1, bool x2, bool x3,
@@ -641,7 +641,7 @@ svbool_t test_svdupq_n_b8(bool x0, bool x1, bool x2, bool 
x3,
 // CHECK-NEXT:[[TMP16:%.*]] = call  
@llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
 // CHECK-NEXT:[[TMP17:%.*]] = call  
@llvm.experimental.vector.insert.nxv8i16.v8i16( undef, <8 x 
i16> [[TMP15]], i64 0)
 // CHECK-NEXT:[[TMP18:%.*]] = call  
@llvm.aarch64.sve.dupq.lane.nxv8i16( [[TMP17]], i64 0)
-// CHECK-NEXT:[[TMP19:%.*]] = call  
@llvm.aarch64.sve.cmpne.wide.nxv8i16( [[TMP16]],  [[TMP18]],  shufflevector ( 
insertelement ( poison, i64 0, i32 0),  
poison,  zeroinitializer))
+// CHECK-NEXT:[[TMP19:%.*]] = call  
@llvm.aarch64.sve.cmpne.wide.nxv8i16( [[TMP16]],  [[TMP18]],  zeroinitializer)
 // CHECK-NEXT:[[TMP20:%.*]] = call  
@llvm.aarch64.sve.convert.to.svbool.nxv8i1( [[TMP19]])
 // CHECK-NEXT:ret  [[TMP20]]
 //
@@ -666,7 +666,7 @@ svbool_t test_svdupq_n_b8(bool x0, bool x1, bool x2, bool 
x3,
 // CPP-CHECK-NEXT:[[TMP16:%.*]] = call  
@llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
 // CPP-CHECK-NEXT:[[TMP17:%.*]] = call  
@llvm.experimental.vector.insert.nxv8i16.v8i16( undef, <8 x 
i16> [[TMP15]], i64 0)
 // CPP-CHECK-NEXT:[[TMP18:%.*]] = call  
@llvm.aarch64.sve.dupq.lane.nxv8i16( [[TMP17]], i64 0)
-// CPP-CHECK-NEXT:[[TMP19:%.*]] = call  
@llvm.aarch64.sve.cmpne.wide.nxv8i16( [[TMP16]],  [[TM

[clang] 607fb1b - [AArch64] Always add -tune-cpu argument to -cc1 driver

2021-10-19 Thread David Sherwood via cfe-commits

Author: David Sherwood
Date: 2021-10-19T14:57:51+01:00
New Revision: 607fb1bb8c91a2f284d8c63f3066ab8cc1a66955

URL: 
https://github.com/llvm/llvm-project/commit/607fb1bb8c91a2f284d8c63f3066ab8cc1a66955
DIFF: 
https://github.com/llvm/llvm-project/commit/607fb1bb8c91a2f284d8c63f3066ab8cc1a66955.diff

LOG: [AArch64] Always add -tune-cpu argument to -cc1 driver

This patch ensures that we always tune for a given CPU on AArch64
targets when the user specifies the "-mtune=xyz" flag. In the
AArch64Subtarget if the tune flag is unset we use the CPU value
instead.

I've updated the release notes here:

  llvm/docs/ReleaseNotes.rst

and added tests here:

  clang/test/Driver/aarch64-mtune.c

Differential Revision: https://reviews.llvm.org/D110258

Added: 
clang/test/Driver/aarch64-mtune.c

Modified: 
clang/docs/ReleaseNotes.rst
clang/lib/Driver/ToolChains/Clang.cpp
llvm/docs/ReleaseNotes.rst
llvm/lib/Target/AArch64/AArch64Subtarget.cpp
llvm/lib/Target/AArch64/AArch64Subtarget.h
llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
llvm/unittests/Target/AArch64/InstSizes.cpp
llvm/unittests/Target/AArch64/MatrixRegisterAliasing.cpp

Removed: 




diff  --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 27ff9ddc70a34..05bd9cfea3fa5 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -192,6 +192,13 @@ Arm and AArch64 Support in Clang
 
 - Support has been added for the following processors (command-line 
identifiers in parentheses):
   - Arm Cortex-A510 (``cortex-a510``)
+- The -mtune flag is no longer ignored for AArch64. It is now possible to
+tune code generation for a particular CPU with -mtune without setting any
+architectural features. For example, compiling with
+"-mcpu=generic -mtune=cortex-a57" will not enable any Cortex-A57 specific
+architecture features, but will enable certain optimizations specific to
+Cortex-A57 CPUs and enable the use of a more accurate scheduling model.
+
 
 Internal API Changes
 

diff  --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index 316c6026adf5c..68b6950364583 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -1833,6 +1833,21 @@ void Clang::AddAArch64TargetArgs(const ArgList &Args,
   }
 
   AddAAPCSVolatileBitfieldArgs(Args, CmdArgs);
+
+  if (const Arg *A = Args.getLastArg(clang::driver::options::OPT_mtune_EQ)) {
+StringRef Name = A->getValue();
+
+std::string TuneCPU;
+if (Name == "native")
+  TuneCPU = std::string(llvm::sys::getHostCPUName());
+else
+  TuneCPU = std::string(Name);
+
+if (!TuneCPU.empty()) {
+  CmdArgs.push_back("-tune-cpu");
+  CmdArgs.push_back(Args.MakeArgString(TuneCPU));
+}
+  }
 }
 
 void Clang::AddMIPSTargetArgs(const ArgList &Args,

diff  --git a/clang/test/Driver/aarch64-mtune.c 
b/clang/test/Driver/aarch64-mtune.c
new file mode 100644
index 0..ae41f4a9983cd
--- /dev/null
+++ b/clang/test/Driver/aarch64-mtune.c
@@ -0,0 +1,42 @@
+// Ensure we support the -mtune flag.
+
+// There shouldn't be a default -mtune.
+// RUN: %clang -target aarch64-unknown-unknown -c -### %s 2>&1 \
+// RUN:   | FileCheck %s -check-prefix=NOTUNE
+// NOTUNE-NOT: "-tune-cpu" "generic"
+
+// RUN: %clang -target aarch64-unknown-unknown -c -### %s -mtune=generic 2>&1 \
+// RUN:   | FileCheck %s -check-prefix=GENERIC
+// GENERIC: "-tune-cpu" "generic"
+
+// RUN: %clang -target aarch64-unknown-unknown -c -### %s -mtune=neoverse-n1 
2>&1 \
+// RUN:   | FileCheck %s -check-prefix=NEOVERSE-N1
+// NEOVERSE-N1: "-tune-cpu" "neoverse-n1"
+
+// RUN: %clang -target aarch64-unknown-unknown -c -### %s -mtune=thunderx2t99 
2>&1 \
+// RUN:   | FileCheck %s -check-prefix=THUNDERX2T99
+// THUNDERX2T99: "-tune-cpu" "thunderx2t99"
+
+// Check interaction between march and mtune.
+
+// RUN: %clang -target aarch64-unknown-unknown -c -### %s -march=armv8-a 2>&1 \
+// RUN:   | FileCheck %s -check-prefix=MARCHARMV8A
+// MARCHARMV8A: "-target-cpu" "generic"
+// MARCHARMV8A-NOT: "-tune-cpu" "generic"
+
+// RUN: %clang -target aarch64-unknown-unknown -c -### %s -march=armv8-a 
-mtune=cortex-a75 2>&1 \
+// RUN:   | FileCheck %s -check-prefix=MARCHARMV8A-A75
+// MARCHARMV8A-A75: "-target-cpu" "generic"
+// MARCHARMV8A-A75: "-tune-cpu" "cortex-a75"
+
+// Check interaction between mcpu and mtune.
+
+// RUN: %clang -target aarch64-unknown-unknown -c -### %s -mcpu=thunderx 2>&1 \
+// RUN:   | FileCheck %s -check-prefix=MCPUTHUNDERX
+// MCPUTHUNDERX: "-target-cpu" "thunderx"
+// MCPUTHUNDERX-NOT: "-tune-cpu"
+
+// RUN: %clang -target aarch64-unknown-unknown -c -### %s -mcpu=cortex-a75 
-mtune=cortex-a57 2>&1 \
+// RUN:   | FileCheck %s -check-prefix=MCPUA75-MTUNEA57
+// MCPUA75-MTUNEA57: "-target-cpu" "cortex-a75"
+// MCPUA75-MTUNEA57: "-tune-cpu" "cortex-a57"

diff  --git a/llvm/docs/ReleaseNote

[clang] 23db763 - Fix documentation errors introduced by 607fb1bb8c91a2f284d8c63f3066ab8cc1a66955

2021-10-19 Thread David Sherwood via cfe-commits

Author: David Sherwood
Date: 2021-10-19T15:12:03+01:00
New Revision: 23db763b7dadbf99cb46c66c855651ac760e56db

URL: 
https://github.com/llvm/llvm-project/commit/23db763b7dadbf99cb46c66c855651ac760e56db
DIFF: 
https://github.com/llvm/llvm-project/commit/23db763b7dadbf99cb46c66c855651ac760e56db.diff

LOG: Fix documentation errors introduced by 
607fb1bb8c91a2f284d8c63f3066ab8cc1a66955

Added: 


Modified: 
clang/docs/ReleaseNotes.rst
llvm/docs/ReleaseNotes.rst

Removed: 




diff  --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 05bd9cfea3fa..11a039204e13 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -193,11 +193,11 @@ Arm and AArch64 Support in Clang
 - Support has been added for the following processors (command-line 
identifiers in parentheses):
   - Arm Cortex-A510 (``cortex-a510``)
 - The -mtune flag is no longer ignored for AArch64. It is now possible to
-tune code generation for a particular CPU with -mtune without setting any
-architectural features. For example, compiling with
-"-mcpu=generic -mtune=cortex-a57" will not enable any Cortex-A57 specific
-architecture features, but will enable certain optimizations specific to
-Cortex-A57 CPUs and enable the use of a more accurate scheduling model.
+  tune code generation for a particular CPU with -mtune without setting any
+  architectural features. For example, compiling with
+  "-mcpu=generic -mtune=cortex-a57" will not enable any Cortex-A57 specific
+  architecture features, but will enable certain optimizations specific to
+  Cortex-A57 CPUs and enable the use of a more accurate scheduling model.
 
 
 Internal API Changes

diff  --git a/llvm/docs/ReleaseNotes.rst b/llvm/docs/ReleaseNotes.rst
index ec8a2e4ae882..1a9e63409b8d 100644
--- a/llvm/docs/ReleaseNotes.rst
+++ b/llvm/docs/ReleaseNotes.rst
@@ -75,9 +75,9 @@ Changes to the AArch64 Backend
 
 * Added support for the Armv9-A, Armv9.1-A and Armv9.2-A architectures.
 * The compiler now recognises the "tune-cpu" function attribute to support
-the use of the -mtune frontend flag. This allows certain scheduling features
-and optimisations to be enabled independently of the architecture. If the
-"tune-cpu" attribute is absent it tunes according to the "target-cpu".
+  the use of the -mtune frontend flag. This allows certain scheduling features
+  and optimisations to be enabled independently of the architecture. If the
+  "tune-cpu" attribute is absent it tunes according to the "target-cpu".
 
 Changes to the ARM Backend
 --



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] f988f68 - [Analysis] Add support for vscale in computeKnownBitsFromOperator

2021-09-20 Thread David Sherwood via cfe-commits

Author: David Sherwood
Date: 2021-09-20T15:01:59+01:00
New Revision: f988f680649ad38806897e7aa75e95e9fda88ffd

URL: 
https://github.com/llvm/llvm-project/commit/f988f680649ad38806897e7aa75e95e9fda88ffd
DIFF: 
https://github.com/llvm/llvm-project/commit/f988f680649ad38806897e7aa75e95e9fda88ffd.diff

LOG: [Analysis] Add support for vscale in computeKnownBitsFromOperator

In ValueTracking.cpp we use a function called
computeKnownBitsFromOperator to determine the known bits of a value.
For the vscale intrinsic if the function contains the vscale_range
attribute we can use the maximum and minimum values of vscale to
determine some known zero and one bits. This should help to improve
code quality by allowing certain optimisations to take place.

Tests added here:

  Transforms/InstCombine/icmp-vscale.ll

Differential Revision: https://reviews.llvm.org/D109883

Added: 
llvm/test/Transforms/InstCombine/icmp-vscale.ll

Modified: 
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cntb.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cntd.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cnth.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cntw.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_len-bfloat.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_len.c
llvm/lib/Analysis/ValueTracking.cpp
llvm/test/Transforms/InstSimplify/vscale.ll
llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-phi.ll

Removed: 




diff  --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cntb.c 
b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cntb.c
index a5991a5a0151..e73751a2fe86 100644
--- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cntb.c
+++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cntb.c
@@ -8,13 +8,13 @@
 // CHECK-LABEL: @test_svcntb(
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:[[TMP0:%.*]] = call i64 @llvm.vscale.i64()
-// CHECK-NEXT:[[TMP1:%.*]] = shl i64 [[TMP0]], 4
+// CHECK-NEXT:[[TMP1:%.*]] = shl nuw nsw i64 [[TMP0]], 4
 // CHECK-NEXT:ret i64 [[TMP1]]
 //
 // CPP-CHECK-LABEL: @_Z11test_svcntbv(
 // CPP-CHECK-NEXT:  entry:
 // CPP-CHECK-NEXT:[[TMP0:%.*]] = call i64 @llvm.vscale.i64()
-// CPP-CHECK-NEXT:[[TMP1:%.*]] = shl i64 [[TMP0]], 4
+// CPP-CHECK-NEXT:[[TMP1:%.*]] = shl nuw nsw i64 [[TMP0]], 4
 // CPP-CHECK-NEXT:ret i64 [[TMP1]]
 //
 uint64_t test_svcntb()
@@ -247,13 +247,13 @@ uint64_t test_svcntb_pat_15()
 // CHECK-LABEL: @test_svcntb_pat_16(
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:[[TMP0:%.*]] = call i64 @llvm.vscale.i64()
-// CHECK-NEXT:[[TMP1:%.*]] = shl i64 [[TMP0]], 4
+// CHECK-NEXT:[[TMP1:%.*]] = shl nuw nsw i64 [[TMP0]], 4
 // CHECK-NEXT:ret i64 [[TMP1]]
 //
 // CPP-CHECK-LABEL: @_Z18test_svcntb_pat_16v(
 // CPP-CHECK-NEXT:  entry:
 // CPP-CHECK-NEXT:[[TMP0:%.*]] = call i64 @llvm.vscale.i64()
-// CPP-CHECK-NEXT:[[TMP1:%.*]] = shl i64 [[TMP0]], 4
+// CPP-CHECK-NEXT:[[TMP1:%.*]] = shl nuw nsw i64 [[TMP0]], 4
 // CPP-CHECK-NEXT:ret i64 [[TMP1]]
 //
 uint64_t test_svcntb_pat_16()

diff  --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cntd.c 
b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cntd.c
index eb3e848a7b53..1e8dc401fa1e 100644
--- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cntd.c
+++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cntd.c
@@ -8,13 +8,13 @@
 // CHECK-LABEL: @test_svcntd(
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:[[TMP0:%.*]] = call i64 @llvm.vscale.i64()
-// CHECK-NEXT:[[TMP1:%.*]] = shl i64 [[TMP0]], 1
+// CHECK-NEXT:[[TMP1:%.*]] = shl nuw nsw i64 [[TMP0]], 1
 // CHECK-NEXT:ret i64 [[TMP1]]
 //
 // CPP-CHECK-LABEL: @_Z11test_svcntdv(
 // CPP-CHECK-NEXT:  entry:
 // CPP-CHECK-NEXT:[[TMP0:%.*]] = call i64 @llvm.vscale.i64()
-// CPP-CHECK-NEXT:[[TMP1:%.*]] = shl i64 [[TMP0]], 1
+// CPP-CHECK-NEXT:[[TMP1:%.*]] = shl nuw nsw i64 [[TMP0]], 1
 // CPP-CHECK-NEXT:ret i64 [[TMP1]]
 //
 uint64_t test_svcntd()
@@ -261,13 +261,13 @@ uint64_t test_svcntd_pat_15()
 // CHECK-LABEL: @test_svcntd_pat_16(
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:[[TMP0:%.*]] = call i64 @llvm.vscale.i64()
-// CHECK-NEXT:[[TMP1:%.*]] = shl i64 [[TMP0]], 1
+// CHECK-NEXT:[[TMP1:%.*]] = shl nuw nsw i64 [[TMP0]], 1
 // CHECK-NEXT:ret i64 [[TMP1]]
 //
 // CPP-CHECK-LABEL: @_Z18test_svcntd_pat_16v(
 // CPP-CHECK-NEXT:  entry:
 // CPP-CHECK-NEXT:[[TMP0:%.*]] = call i64 @llvm.vscale.i64()
-// CPP-CHECK-NEXT:[[TMP1:%.*]] = shl i64 [[TMP0]], 1
+// CPP-CHECK-NEXT:[[TMP1:%.*]] = shl nuw nsw i64 [[TMP0]], 1
 // CPP-CHECK-NEXT:ret i64 [[TMP1]]
 //
 uint64_t test_svcntd_pat_16()

diff  --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cnth.c 
b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cnth.c
index 50ca8f525387..27a2fdca1abc 100644
--- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cnth.c
+++ b/clang/

[clang] bfb6f47 - [SVE] Change some bfloat lane intrinsics to use i32 immediates

2022-12-07 Thread David Sherwood via cfe-commits

Author: David Sherwood
Date: 2022-12-07T09:19:54Z
New Revision: bfb6f47e9ea463555833934ef714b03ee78eb01e

URL: 
https://github.com/llvm/llvm-project/commit/bfb6f47e9ea463555833934ef714b03ee78eb01e
DIFF: 
https://github.com/llvm/llvm-project/commit/bfb6f47e9ea463555833934ef714b03ee78eb01e.diff

LOG: [SVE] Change some bfloat lane intrinsics to use i32 immediates

Almost all of the other SVE LLVM IR intrinsics take i32 values
for lane indices or other immediates. We should bring the bfloat
intrinsics in line with that. It will also make it easier to
add support for the SVE2.1 float intrinsics in future, since
they reuse the same underlying instruction classes.

I've maintained backwards compatibility with the old i64 variants
and used the autoupgrade mechanism.

Differential Revision: https://reviews.llvm.org/D138788

Added: 


Modified: 
clang/include/clang/Basic/arm_sve.td
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_bfdot.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_bfmlalb.c
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_bfmlalt.c
llvm/include/llvm/IR/IntrinsicsAArch64.td
llvm/lib/IR/AutoUpgrade.cpp
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
llvm/lib/Target/AArch64/SVEInstrFormats.td
llvm/test/Bitcode/upgrade-aarch64-sve-intrinsics.ll
llvm/test/CodeGen/AArch64/sve-intrinsics-bfloat.ll

Removed: 




diff  --git a/clang/include/clang/Basic/arm_sve.td 
b/clang/include/clang/Basic/arm_sve.td
index 175b572ffdab8..6c24f04232382 100644
--- a/clang/include/clang/Basic/arm_sve.td
+++ b/clang/include/clang/Basic/arm_sve.td
@@ -537,9 +537,9 @@ let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16)" in {
   def SVBFDOT_N  : SInst<"svbfdot[_n_{0}]",  "MMda",  "b", MergeNone, 
"aarch64_sve_bfdot",[IsOverloadNone]>;
   def SVBFMLAL_N : SInst<"svbfmlalb[_n_{0}]","MMda",  "b", MergeNone, 
"aarch64_sve_bfmlalb",  [IsOverloadNone]>;
   def SVBFMLALT_N: SInst<"svbfmlalt[_n_{0}]","MMda",  "b", MergeNone, 
"aarch64_sve_bfmlalt",  [IsOverloadNone]>;
-  def SVBFDOT_LANE   : SInst<"svbfdot_lane[_{0}]",   "MMddn", "b", MergeNone, 
"aarch64_sve_bfdot_lane",   [IsOverloadNone], [ImmCheck<3, ImmCheck0_3>]>;
-  def SVBFMLALB_LANE : SInst<"svbfmlalb_lane[_{0}]", "MMddn", "b", MergeNone, 
"aarch64_sve_bfmlalb_lane", [IsOverloadNone], [ImmCheck<3, ImmCheck0_7>]>;
-  def SVBFMLALT_LANE : SInst<"svbfmlalt_lane[_{0}]", "MMddn", "b", MergeNone, 
"aarch64_sve_bfmlalt_lane", [IsOverloadNone], [ImmCheck<3, ImmCheck0_7>]>;
+  def SVBFDOT_LANE   : SInst<"svbfdot_lane[_{0}]",   "MMddi", "b", MergeNone, 
"aarch64_sve_bfdot_lane_v2",   [IsOverloadNone], [ImmCheck<3, ImmCheck0_3>]>;
+  def SVBFMLALB_LANE : SInst<"svbfmlalb_lane[_{0}]", "MMddi", "b", MergeNone, 
"aarch64_sve_bfmlalb_lane_v2", [IsOverloadNone], [ImmCheck<3, ImmCheck0_7>]>;
+  def SVBFMLALT_LANE : SInst<"svbfmlalt_lane[_{0}]", "MMddi", "b", MergeNone, 
"aarch64_sve_bfmlalt_lane_v2", [IsOverloadNone], [ImmCheck<3, ImmCheck0_7>]>;
 }
 
 


diff  --git a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_bfdot.c 
b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_bfdot.c
index 7735a3173d38a..454b4b546a9d5 100644
--- a/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_bfdot.c
+++ b/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_bfdot.c
@@ -31,12 +31,12 @@ svfloat32_t test_bfdot_f32(svfloat32_t x, svbfloat16_t y, 
svbfloat16_t z) {
 
 // CHECK-LABEL: @test_bfdot_lane_0_f32(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:[[TMP0:%.*]] = tail call  
@llvm.aarch64.sve.bfdot.lane( [[X:%.*]],  [[Y:%.*]],  [[Z:%.*]], i64 0)
+// CHECK-NEXT:[[TMP0:%.*]] = tail call  
@llvm.aarch64.sve.bfdot.lane.v2( [[X:%.*]],  [[Y:%.*]],  [[Z:%.*]], i32 0)
 // CHECK-NEXT:ret  [[TMP0]]
 //
 // CPP-CHECK-LABEL: 
@_Z21test_bfdot_lane_0_f32u13__SVFloat32_tu14__SVBFloat16_tu14__SVBFloat16_t(
 // CPP-CHECK-NEXT:  entry:
-// CPP-CHECK-NEXT:[[TMP0:%.*]] = tail call  
@llvm.aarch64.sve.bfdot.lane( [[X:%.*]],  [[Y:%.*]],  [[Z:%.*]], i64 0)
+// CPP-CHECK-NEXT:[[TMP0:%.*]] = tail call  
@llvm.aarch64.sve.bfdot.lane.v2( [[X:%.*]],  [[Y:%.*]],  [[Z:%.*]], i32 0)
 // CPP-CHECK-NEXT:ret  [[TMP0]]
 //
 svfloat32_t test_bfdot_lane_0_f32(svfloat32_t x, svbfloat16_t y, svbfloat16_t 
z) {
@@ -45,12 +45,12 @@ svfloat32_t test_bfdot_lane_0_f32(svfloat32_t x, 
svbfloat16_t y, svbfloat16_t z)
 
 // CHECK-LABEL: @test_bfdot_lane_3_f32(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:[[TMP0:%.*]] = tail call  
@llvm.aarch64.sve.bfdot.lane( [[X:%.*]],  [[Y:%.*]],  [[Z:%.*]], i64 3)
+// CHECK-NEXT:[[TMP0:%.*]] = tail call  
@llvm.aarch64.sve.bfdot.lane.v2( [[X:%.*]],  [[Y:%.*]],  [[Z:%.*]], i32 3)
 // CHECK-NEXT:ret  [[TMP0]]
 //
 // CPP-CHECK-LABEL: 
@_Z21test_bfdot_lane_3_f32u13__SVFloat32_tu14__SVBFloat16_tu14__SVBFloat16_t(
 // CPP-CHECK-NEXT:  entry:
-/

[llvm] [clang] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)

2023-11-15 Thread David Sherwood via cfe-commits

https://github.com/david-arm commented:

Wow, this is a huge patch. :) It took me a few hours to work through all the 
tests, and it's quite possible I've missed something. However, overall it looks 
fine and I can't see any major issues. I think there is one missing test, but 
once that's fixed I'll approve it!

https://github.com/llvm/llvm-project/pull/70474
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)

2023-11-15 Thread David Sherwood via cfe-commits

https://github.com/david-arm edited 
https://github.com/llvm/llvm-project/pull/70474
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)

2023-11-15 Thread David Sherwood via cfe-commits


@@ -0,0 +1,2503 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// REQUIRES: aarch64-registered-target
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve2p1 
-target-feature +bf16 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | 
opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve2p1 
-target-feature +bf16 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x 
c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s 
-check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu 
-target-feature +sve2p1 -target-feature +bf16 -S -disable-O0-optnone -Werror 
-Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | 
FileCheck %s
+// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu 
-target-feature +sve2p1 -target-feature +bf16 -S -disable-O0-optnone -Werror 
-Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | 
FileCheck %s -check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve2p1 
-target-feature +bf16 -S -disable-O0-optnone -Werror -Wall -o /dev/null %s
+
+#include 
+
+#ifdef SVE_OVERLOADED_FORMS
+// A simple used,unused... macro, long enough to represent any SVE builtin.
+#define SVE_ACLE_FUNC(A1,A2_UNUSED,A3,A4_UNUSED) A1##A3
+#else
+#define SVE_ACLE_FUNC(A1,A2,A3,A4) A1##A2##A3##A4
+#endif
+
+// CHECK-LABEL: @test_svld2q_u8(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[TMP0:%.*]] = tail call { ,  } @llvm.aarch64.sve.ld2q.sret.nxv16i8( [[PG:%.*]], ptr 
[[BASE:%.*]])
+// CHECK-NEXT:[[TMP1:%.*]] = extractvalue { ,  } [[TMP0]], 0
+// CHECK-NEXT:[[TMP2:%.*]] = tail call  
@llvm.vector.insert.nxv32i8.nxv16i8( poison,  [[TMP1]], i64 0)
+// CHECK-NEXT:[[TMP3:%.*]] = extractvalue { ,  } [[TMP0]], 1
+// CHECK-NEXT:[[TMP4:%.*]] = tail call  
@llvm.vector.insert.nxv32i8.nxv16i8( [[TMP2]],  [[TMP3]], i64 16)
+// CHECK-NEXT:ret  [[TMP4]]
+//
+// CPP-CHECK-LABEL: @_Z14test_svld2q_u8u10__SVBool_tPKh(
+// CPP-CHECK-NEXT:  entry:
+// CPP-CHECK-NEXT:[[TMP0:%.*]] = tail call { ,  } @llvm.aarch64.sve.ld2q.sret.nxv16i8( [[PG:%.*]], 
ptr [[BASE:%.*]])
+// CPP-CHECK-NEXT:[[TMP1:%.*]] = extractvalue { , 
 } [[TMP0]], 0
+// CPP-CHECK-NEXT:[[TMP2:%.*]] = tail call  
@llvm.vector.insert.nxv32i8.nxv16i8( poison,  [[TMP1]], i64 0)
+// CPP-CHECK-NEXT:[[TMP3:%.*]] = extractvalue { , 
 } [[TMP0]], 1
+// CPP-CHECK-NEXT:[[TMP4:%.*]] = tail call  
@llvm.vector.insert.nxv32i8.nxv16i8( [[TMP2]],  [[TMP3]], i64 16)
+// CPP-CHECK-NEXT:ret  [[TMP4]]
+//
+svuint8x2_t test_svld2q_u8(svbool_t pg, const uint8_t *base)
+{
+  return SVE_ACLE_FUNC(svld2q,,_u8,)(pg, base);
+}
+
+// CHECK-LABEL: @test_svld2q_s8(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[TMP0:%.*]] = tail call { ,  } @llvm.aarch64.sve.ld2q.sret.nxv16i8( [[PG:%.*]], ptr 
[[BASE:%.*]])
+// CHECK-NEXT:[[TMP1:%.*]] = extractvalue { ,  } [[TMP0]], 0
+// CHECK-NEXT:[[TMP2:%.*]] = tail call  
@llvm.vector.insert.nxv32i8.nxv16i8( poison,  [[TMP1]], i64 0)
+// CHECK-NEXT:[[TMP3:%.*]] = extractvalue { ,  } [[TMP0]], 1
+// CHECK-NEXT:[[TMP4:%.*]] = tail call  
@llvm.vector.insert.nxv32i8.nxv16i8( [[TMP2]],  [[TMP3]], i64 16)
+// CHECK-NEXT:ret  [[TMP4]]
+//
+// CPP-CHECK-LABEL: @_Z14test_svld2q_s8u10__SVBool_tPKa(
+// CPP-CHECK-NEXT:  entry:
+// CPP-CHECK-NEXT:[[TMP0:%.*]] = tail call { ,  } @llvm.aarch64.sve.ld2q.sret.nxv16i8( [[PG:%.*]], 
ptr [[BASE:%.*]])
+// CPP-CHECK-NEXT:[[TMP1:%.*]] = extractvalue { , 
 } [[TMP0]], 0
+// CPP-CHECK-NEXT:[[TMP2:%.*]] = tail call  
@llvm.vector.insert.nxv32i8.nxv16i8( poison,  [[TMP1]], i64 0)
+// CPP-CHECK-NEXT:[[TMP3:%.*]] = extractvalue { , 
 } [[TMP0]], 1
+// CPP-CHECK-NEXT:[[TMP4:%.*]] = tail call  
@llvm.vector.insert.nxv32i8.nxv16i8( [[TMP2]],  [[TMP3]], i64 16)
+// CPP-CHECK-NEXT:ret  [[TMP4]]
+//
+svint8x2_t test_svld2q_s8(svbool_t pg, const int8_t *base)
+{
+  return SVE_ACLE_FUNC(svld2q,,_s8,)(pg, base);
+}
+// CHECK-LABEL: @test_svld2q_u16(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[TMP0:%.*]] = tail call  
@llvm.aarch64.sve.convert.from.svbool.nxv8i1( [[PG:%.*]])
+// CHECK-NEXT:[[TMP1:%.*]] = tail call { ,  } @llvm.aarch64.sve.ld2q.sret.nxv8i16( [[TMP0]], ptr 
[[BASE:%.*]])
+// CHECK-NEXT:[[TMP2:%.*]] = extractvalue { ,  } [[TMP1]], 0
+// CHECK-NEXT:[[TMP3:%.*]] = tail call  
@llvm.vector.insert.nxv16i16.nxv8i16( poison,  [[TMP2]], i64 0)
+// CHECK-NEXT:[[TMP4:%.*]] = extractvalue { ,  } [[TMP1]], 1
+// CHECK-NEXT:[[TMP5:%.*]] = tail call  
@llvm.vector.insert.nxv16i16.nxv8i16( [[TMP3]],  [[TMP4]], i64 8)
+// CHECK-NEXT:ret  [[TMP5]]
+//
+// CPP-CHECK-LABEL: @_Z15test_svld2q_u16u10__SVBool_tPKt(
+// CPP-CHECK-NEXT:  entry:
+// CPP-CHECK-NEXT:[[TMP0:%.*]] = tail call  
@llvm.aarch64.sve.convert.from.svbool.nxv8i1( [[PG:%.*]])

[clang-tools-extra] [clang] [llvm] [LoopVectorize] Improve algorithm for hoisting runtime checks (PR #73515)

2023-11-30 Thread David Sherwood via cfe-commits

https://github.com/david-arm updated 
https://github.com/llvm/llvm-project/pull/73515

>From 30251642f8c208c63f3f3097c337ef0d5bc633b5 Mon Sep 17 00:00:00 2001
From: David Sherwood 
Date: Mon, 27 Nov 2023 13:43:26 +
Subject: [PATCH 1/3] [LoopVectorize] Improve algorithm for hoisting runtime
 checks

When attempting to hoist runtime checks out of a loop we currently
avoid creating pointer diff checks and prefer to do expanded range
checks instead. This gives us the opportunity to hoist runtime
checks out of a loop, since these checks are loop invariant. However,
in some cases the pointer diff checks would also be loop invariant
and so will naturally get hoisted. Therefore, since diff checks are
cheaper so we should prefer to use those instead.
---
 llvm/lib/Analysis/LoopAccessAnalysis.cpp  |   5 +-
 .../LoopVectorize/runtime-checks-hoist.ll | 143 ++
 2 files changed, 121 insertions(+), 27 deletions(-)

diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp 
b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index 3d1edd5f038a25e..057652233979876 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -346,7 +346,10 @@ void RuntimePointerChecking::tryToCreateDiffCheck(
 auto *SinkStartAR = cast(SinkStartInt);
 const Loop *StartARLoop = SrcStartAR->getLoop();
 if (StartARLoop == SinkStartAR->getLoop() &&
-StartARLoop == InnerLoop->getParentLoop()) {
+StartARLoop == InnerLoop->getParentLoop() &&
+!SE->isKnownPredicate(ICmpInst::ICMP_EQ,
+  SrcStartAR->getStepRecurrence(*SE),
+  SinkStartAR->getStepRecurrence(*SE))) {
   LLVM_DEBUG(dbgs() << "LAA: Not creating diff runtime check, since these "
"cannot be hoisted out of the outer loop\n");
   CanUseDiffCheck = false;
diff --git a/llvm/test/Transforms/LoopVectorize/runtime-checks-hoist.ll 
b/llvm/test/Transforms/LoopVectorize/runtime-checks-hoist.ll
index 891597cbdc48a8f..81702bf34e96bed 100644
--- a/llvm/test/Transforms/LoopVectorize/runtime-checks-hoist.ll
+++ b/llvm/test/Transforms/LoopVectorize/runtime-checks-hoist.ll
@@ -69,11 +69,11 @@ define void @diff_checks(ptr nocapture noundef writeonly 
%dst, ptr nocapture nou
 ; CHECK-NEXT:[[TMP14:%.*]] = add nuw nsw i64 [[TMP13]], [[TMP10]]
 ; CHECK-NEXT:[[TMP15:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 
[[TMP14]]
 ; CHECK-NEXT:[[TMP16:%.*]] = getelementptr inbounds i32, ptr [[TMP15]], 
i32 0
-; CHECK-NEXT:[[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP16]], align 4, 
!alias.scope !0
+; CHECK-NEXT:[[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP16]], align 4, 
!alias.scope [[META0:![0-9]+]]
 ; CHECK-NEXT:[[TMP17:%.*]] = add nsw i64 [[TMP13]], [[TMP11]]
 ; CHECK-NEXT:[[TMP18:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 
[[TMP17]]
 ; CHECK-NEXT:[[TMP19:%.*]] = getelementptr inbounds i32, ptr [[TMP18]], 
i32 0
-; CHECK-NEXT:store <4 x i32> [[WIDE_LOAD]], ptr [[TMP19]], align 4, 
!alias.scope !3, !noalias !0
+; CHECK-NEXT:store <4 x i32> [[WIDE_LOAD]], ptr [[TMP19]], align 4, 
!alias.scope [[META3:![0-9]+]], !noalias [[META0]]
 ; CHECK-NEXT:[[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
 ; CHECK-NEXT:[[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
 ; CHECK-NEXT:br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label 
[[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
@@ -189,12 +189,12 @@ define void @full_checks(ptr nocapture noundef %dst, ptr 
nocapture noundef reado
 ; CHECK-NEXT:[[TMP5:%.*]] = add nuw nsw i64 [[TMP4]], [[TMP3]]
 ; CHECK-NEXT:[[TMP6:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 
[[TMP5]]
 ; CHECK-NEXT:[[TMP7:%.*]] = getelementptr inbounds i32, ptr [[TMP6]], i32 0
-; CHECK-NEXT:[[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP7]], align 4, 
!alias.scope !9
+; CHECK-NEXT:[[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP7]], align 4, 
!alias.scope [[META9:![0-9]+]]
 ; CHECK-NEXT:[[TMP8:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 
[[TMP5]]
 ; CHECK-NEXT:[[TMP9:%.*]] = getelementptr inbounds i32, ptr [[TMP8]], i32 0
-; CHECK-NEXT:[[WIDE_LOAD2:%.*]] = load <4 x i32>, ptr [[TMP9]], align 4, 
!alias.scope !12, !noalias !9
+; CHECK-NEXT:[[WIDE_LOAD2:%.*]] = load <4 x i32>, ptr [[TMP9]], align 4, 
!alias.scope [[META12:![0-9]+]], !noalias [[META9]]
 ; CHECK-NEXT:[[TMP10:%.*]] = add nsw <4 x i32> [[WIDE_LOAD2]], 
[[WIDE_LOAD]]
-; CHECK-NEXT:store <4 x i32> [[TMP10]], ptr [[TMP9]], align 4, 
!alias.scope !12, !noalias !9
+; CHECK-NEXT:store <4 x i32> [[TMP10]], ptr [[TMP9]], align 4, 
!alias.scope [[META12]], !noalias [[META9]]
 ; CHECK-NEXT:[[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
 ; CHECK-NEXT:[[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
 ; CHECK-NEXT:br i1 [[TMP11]], label [[MIDDLE_BLOCK:%.*]], label 
[[VECTOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]]
@@ -319,13 +319,13 @@ define void @ful

[clang] [llvm] [SME2] Add LUTI2 and LUTI4 quad Builtins and Intrinsics (PR #73317)

2023-11-30 Thread David Sherwood via cfe-commits


@@ -1859,6 +1867,34 @@ void AArch64DAGToDAGISel::SelectFrintFromVT(SDNode *N, 
unsigned NumVecs,
   SelectUnaryMultiIntrinsic(N, NumVecs, true, Opcode);
 }
 
+template 

david-arm wrote:

Rather than create two almost identical copies of the function with a template 
parameter, I think in this case it makes sense to just pass Max in as a 
function argument.

https://github.com/llvm/llvm-project/pull/73317
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [SME2] Add LUTI2 and LUTI4 quad Builtins and Intrinsics (PR #73317)

2023-11-30 Thread David Sherwood via cfe-commits


@@ -1859,6 +1867,34 @@ void AArch64DAGToDAGISel::SelectFrintFromVT(SDNode *N, 
unsigned NumVecs,
   SelectUnaryMultiIntrinsic(N, NumVecs, true, Opcode);
 }
 
+template 
+void AArch64DAGToDAGISel::SelectMultiVectorLuti(SDNode *Node,
+unsigned NumOutVecs,
+unsigned Opc) {
+  if (ConstantSDNode *Imm = dyn_cast(Node->getOperand(4)))
+if (Imm->getZExtValue() > Max)
+  return;
+
+  SDValue ZtValue;
+  ImmToTile(Node->getOperand(2), ZtValue);

david-arm wrote:

If someone invokes the intrinsic with Op2 != 0 this will likely crash. Is it 
worth asserting the result of ImmToTile is true so that at least it's more 
obvious?

https://github.com/llvm/llvm-project/pull/73317
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [SME2] Add LUTI2 and LUTI4 quad Builtins and Intrinsics (PR #73317)

2023-11-30 Thread David Sherwood via cfe-commits


@@ -0,0 +1,280 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+
+// REQUIRES: aarch64-registered-target
+
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 
-target-feature +sve -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | 
opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 
-target-feature +sve -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x 
c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s 
-check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 
-target-feature +sve -S -disable-O0-optnone -Werror -Wall -o /dev/null %s
+
+#include 
+
+// CHECK-LABEL: @test_svluti2_lane_zt_u8(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[TMP0:%.*]] = tail call { , , ,  } 
@llvm.aarch64.sme.luti2.lane.zt.x4.nxv16i8(i32 0,  
[[ZN:%.*]], i32 0)
+// CHECK-NEXT:[[TMP1:%.*]] = extractvalue { , , ,  } [[TMP0]], 0
+// CHECK-NEXT:[[TMP2:%.*]] = tail call  
@llvm.vector.insert.nxv64i8.nxv16i8( poison,  [[TMP1]], i64 0)
+// CHECK-NEXT:[[TMP3:%.*]] = extractvalue { , , ,  } [[TMP0]], 1
+// CHECK-NEXT:[[TMP4:%.*]] = tail call  
@llvm.vector.insert.nxv64i8.nxv16i8( [[TMP2]],  [[TMP3]], i64 16)
+// CHECK-NEXT:[[TMP5:%.*]] = extractvalue { , , ,  } [[TMP0]], 2
+// CHECK-NEXT:[[TMP6:%.*]] = tail call  
@llvm.vector.insert.nxv64i8.nxv16i8( [[TMP4]],  [[TMP5]], i64 32)
+// CHECK-NEXT:[[TMP7:%.*]] = extractvalue { , , ,  } [[TMP0]], 3
+// CHECK-NEXT:[[TMP8:%.*]] = tail call  
@llvm.vector.insert.nxv64i8.nxv16i8( [[TMP6]],  [[TMP7]], i64 48)
+// CHECK-NEXT:ret  [[TMP8]]
+//
+// CPP-CHECK-LABEL: @_Z23test_svluti2_lane_zt_u8u11__SVUint8_t(
+// CPP-CHECK-NEXT:  entry:
+// CPP-CHECK-NEXT:[[TMP0:%.*]] = tail call { , , ,  } 
@llvm.aarch64.sme.luti2.lane.zt.x4.nxv16i8(i32 0,  
[[ZN:%.*]], i32 0)
+// CPP-CHECK-NEXT:[[TMP1:%.*]] = extractvalue { , 
, ,  } [[TMP0]], 0
+// CPP-CHECK-NEXT:[[TMP2:%.*]] = tail call  
@llvm.vector.insert.nxv64i8.nxv16i8( poison,  [[TMP1]], i64 0)
+// CPP-CHECK-NEXT:[[TMP3:%.*]] = extractvalue { , 
, ,  } [[TMP0]], 1
+// CPP-CHECK-NEXT:[[TMP4:%.*]] = tail call  
@llvm.vector.insert.nxv64i8.nxv16i8( [[TMP2]],  [[TMP3]], i64 16)
+// CPP-CHECK-NEXT:[[TMP5:%.*]] = extractvalue { , 
, ,  } [[TMP0]], 2
+// CPP-CHECK-NEXT:[[TMP6:%.*]] = tail call  
@llvm.vector.insert.nxv64i8.nxv16i8( [[TMP4]],  [[TMP5]], i64 32)
+// CPP-CHECK-NEXT:[[TMP7:%.*]] = extractvalue { , 
, ,  } [[TMP0]], 3
+// CPP-CHECK-NEXT:[[TMP8:%.*]] = tail call  
@llvm.vector.insert.nxv64i8.nxv16i8( [[TMP6]],  [[TMP7]], i64 48)
+// CPP-CHECK-NEXT:ret  [[TMP8]]
+//
+svuint8x4_t test_svluti2_lane_zt_u8(svuint8_t zn) __arm_streaming 
__arm_shared_za __arm_preserves_za {

david-arm wrote:

For all of the functions in both test files shouldn't we also be testing the 
overloaded forms of the builtins?

I'd expected to see 5 RUN lines in total for each file

https://github.com/llvm/llvm-project/pull/73317
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AArch64][SME2] Add ldr_zt, str_zt builtins and intrinsics (PR #72849)

2023-11-30 Thread David Sherwood via cfe-commits


@@ -0,0 +1,51 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+
+// REQUIRES: aarch64-registered-target
+
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 -S 
-disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p 
mem2reg,instcombine,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 -S 
-disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p 
mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu 
-target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | 
opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu 
-target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x 
c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s 
-check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 -S 
-disable-O0-optnone -Werror -Wall -o /dev/null %s
+
+#include 
+
+#ifdef SVE_OVERLOADED_FORMS

david-arm wrote:

Can delete these lines.

https://github.com/llvm/llvm-project/pull/72849
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AArch64][SME2] Add ldr_zt, str_zt builtins and intrinsics (PR #72849)

2023-11-30 Thread David Sherwood via cfe-commits


@@ -0,0 +1,51 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+
+// REQUIRES: aarch64-registered-target
+
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 -S 
-disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p 
mem2reg,instcombine,tailcallelim | FileCheck %s

david-arm wrote:

I think in this case we can kill off the RUN lines for the overloaded forms 
because in the ACLE they are never overloaded.

https://github.com/llvm/llvm-project/pull/72849
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AArch64][SME2] Add ldr_zt, str_zt builtins and intrinsics (PR #72849)

2023-11-30 Thread David Sherwood via cfe-commits


@@ -2748,6 +2748,22 @@ AArch64TargetLowering::EmitFill(MachineInstr &MI, 
MachineBasicBlock *BB) const {
   return BB;
 }
 
+MachineBasicBlock *AArch64TargetLowering::EmitZTSpillFill(MachineInstr &MI,
+  MachineBasicBlock 
*BB,
+  bool IsSpill) const {
+  const TargetInstrInfo *TII = Subtarget->getInstrInfo();
+  MachineInstrBuilder MIB;
+  if (IsSpill) {

david-arm wrote:

I think this can be simplified to

```
  unsigned Opc = IsSpill ? AArch64::STR_TX : AArch64::LDR_TX;
  MIB = BuildMI(*BB, MI, MI.getDebugLoc(), TII->get(Opc));
  MIB.addReg(MI.getOperand(0).getReg());
  MIB.add(MI.getOperand(1)); // Base
```

https://github.com/llvm/llvm-project/pull/72849
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AArch64][SME2] Add ldr_zt, str_zt builtins and intrinsics (PR #72849)

2023-11-30 Thread David Sherwood via cfe-commits

david-arm wrote:

It looks like a few other pull requests are changing the same code around 
ImmToTile. Might be good to land this smaller patch first so you can rebase the 
others and reduce the diffs!

https://github.com/llvm/llvm-project/pull/72849
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [AArch64][SME2] Add ldr_zt, str_zt builtins and intrinsics (PR #72849)

2023-11-30 Thread David Sherwood via cfe-commits

https://github.com/david-arm approved this pull request.

LGTM! C'est parfait!

https://github.com/llvm/llvm-project/pull/72849
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


  1   2   >