[llvm] [clang] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)

2023-11-21 Thread Momchil Velikov via cfe-commits

https://github.com/momchil-velikov edited 
https://github.com/llvm/llvm-project/pull/70474
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)

2023-11-15 Thread Momchil Velikov via cfe-commits


@@ -0,0 +1,2503 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// REQUIRES: aarch64-registered-target
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve2p1 
-target-feature +bf16 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | 
opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve2p1 
-target-feature +bf16 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x 
c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s 
-check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu 
-target-feature +sve2p1 -target-feature +bf16 -S -disable-O0-optnone -Werror 
-Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | 
FileCheck %s
+// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu 
-target-feature +sve2p1 -target-feature +bf16 -S -disable-O0-optnone -Werror 
-Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | 
FileCheck %s -check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve2p1 
-target-feature +bf16 -S -disable-O0-optnone -Werror -Wall -o /dev/null %s
+
+#include 
+
+#ifdef SVE_OVERLOADED_FORMS
+// A simple used,unused... macro, long enough to represent any SVE builtin.
+#define SVE_ACLE_FUNC(A1,A2_UNUSED,A3,A4_UNUSED) A1##A3
+#else
+#define SVE_ACLE_FUNC(A1,A2,A3,A4) A1##A2##A3##A4
+#endif
+
+// CHECK-LABEL: @test_svld2q_u8(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[TMP0:%.*]] = tail call { ,  } @llvm.aarch64.sve.ld2q.sret.nxv16i8( [[PG:%.*]], ptr 
[[BASE:%.*]])
+// CHECK-NEXT:[[TMP1:%.*]] = extractvalue { ,  } [[TMP0]], 0
+// CHECK-NEXT:[[TMP2:%.*]] = tail call  
@llvm.vector.insert.nxv32i8.nxv16i8( poison,  [[TMP1]], i64 0)
+// CHECK-NEXT:[[TMP3:%.*]] = extractvalue { ,  } [[TMP0]], 1
+// CHECK-NEXT:[[TMP4:%.*]] = tail call  
@llvm.vector.insert.nxv32i8.nxv16i8( [[TMP2]],  [[TMP3]], i64 16)
+// CHECK-NEXT:ret  [[TMP4]]
+//
+// CPP-CHECK-LABEL: @_Z14test_svld2q_u8u10__SVBool_tPKh(
+// CPP-CHECK-NEXT:  entry:
+// CPP-CHECK-NEXT:[[TMP0:%.*]] = tail call { ,  } @llvm.aarch64.sve.ld2q.sret.nxv16i8( [[PG:%.*]], 
ptr [[BASE:%.*]])
+// CPP-CHECK-NEXT:[[TMP1:%.*]] = extractvalue { , 
 } [[TMP0]], 0
+// CPP-CHECK-NEXT:[[TMP2:%.*]] = tail call  
@llvm.vector.insert.nxv32i8.nxv16i8( poison,  [[TMP1]], i64 0)
+// CPP-CHECK-NEXT:[[TMP3:%.*]] = extractvalue { , 
 } [[TMP0]], 1
+// CPP-CHECK-NEXT:[[TMP4:%.*]] = tail call  
@llvm.vector.insert.nxv32i8.nxv16i8( [[TMP2]],  [[TMP3]], i64 16)
+// CPP-CHECK-NEXT:ret  [[TMP4]]
+//
+svuint8x2_t test_svld2q_u8(svbool_t pg, const uint8_t *base)
+{
+  return SVE_ACLE_FUNC(svld2q,,_u8,)(pg, base);
+}
+
+// CHECK-LABEL: @test_svld2q_s8(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[TMP0:%.*]] = tail call { ,  } @llvm.aarch64.sve.ld2q.sret.nxv16i8( [[PG:%.*]], ptr 
[[BASE:%.*]])
+// CHECK-NEXT:[[TMP1:%.*]] = extractvalue { ,  } [[TMP0]], 0
+// CHECK-NEXT:[[TMP2:%.*]] = tail call  
@llvm.vector.insert.nxv32i8.nxv16i8( poison,  [[TMP1]], i64 0)
+// CHECK-NEXT:[[TMP3:%.*]] = extractvalue { ,  } [[TMP0]], 1
+// CHECK-NEXT:[[TMP4:%.*]] = tail call  
@llvm.vector.insert.nxv32i8.nxv16i8( [[TMP2]],  [[TMP3]], i64 16)
+// CHECK-NEXT:ret  [[TMP4]]
+//
+// CPP-CHECK-LABEL: @_Z14test_svld2q_s8u10__SVBool_tPKa(
+// CPP-CHECK-NEXT:  entry:
+// CPP-CHECK-NEXT:[[TMP0:%.*]] = tail call { ,  } @llvm.aarch64.sve.ld2q.sret.nxv16i8( [[PG:%.*]], 
ptr [[BASE:%.*]])
+// CPP-CHECK-NEXT:[[TMP1:%.*]] = extractvalue { , 
 } [[TMP0]], 0
+// CPP-CHECK-NEXT:[[TMP2:%.*]] = tail call  
@llvm.vector.insert.nxv32i8.nxv16i8( poison,  [[TMP1]], i64 0)
+// CPP-CHECK-NEXT:[[TMP3:%.*]] = extractvalue { , 
 } [[TMP0]], 1
+// CPP-CHECK-NEXT:[[TMP4:%.*]] = tail call  
@llvm.vector.insert.nxv32i8.nxv16i8( [[TMP2]],  [[TMP3]], i64 16)
+// CPP-CHECK-NEXT:ret  [[TMP4]]
+//
+svint8x2_t test_svld2q_s8(svbool_t pg, const int8_t *base)
+{
+  return SVE_ACLE_FUNC(svld2q,,_s8,)(pg, base);
+}
+// CHECK-LABEL: @test_svld2q_u16(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[TMP0:%.*]] = tail call  
@llvm.aarch64.sve.convert.from.svbool.nxv8i1( [[PG:%.*]])
+// CHECK-NEXT:[[TMP1:%.*]] = tail call { ,  } @llvm.aarch64.sve.ld2q.sret.nxv8i16( [[TMP0]], ptr 
[[BASE:%.*]])
+// CHECK-NEXT:[[TMP2:%.*]] = extractvalue { ,  } [[TMP1]], 0
+// CHECK-NEXT:[[TMP3:%.*]] = tail call  
@llvm.vector.insert.nxv16i16.nxv8i16( poison,  [[TMP2]], i64 0)
+// CHECK-NEXT:[[TMP4:%.*]] = extractvalue { ,  } [[TMP1]], 1
+// CHECK-NEXT:[[TMP5:%.*]] = tail call  
@llvm.vector.insert.nxv16i16.nxv8i16( [[TMP3]],  [[TMP4]], i64 8)
+// CHECK-NEXT:ret  [[TMP5]]
+//
+// CPP-CHECK-LABEL: @_Z15test_svld2q_u16u10__SVBool_tPKt(
+// CPP-CHECK-NEXT:  entry:
+// CPP-CHECK-NEXT:[[TMP0:%.*]] = tail call  
@llvm.aarch64.sve.convert.from.svbool.nxv8i1( 

[llvm] [clang] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)

2023-11-15 Thread David Sherwood via cfe-commits

https://github.com/david-arm edited 
https://github.com/llvm/llvm-project/pull/70474
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)

2023-11-15 Thread David Sherwood via cfe-commits

https://github.com/david-arm commented:

Wow, this is a huge patch. :) It took me a few hours to work through all the 
tests, and it's quite possible I've missed something. However, overall it looks 
fine and I can't see any major issues. I think there is one missing test, but 
once that's fixed I'll approve it!

https://github.com/llvm/llvm-project/pull/70474
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)

2023-11-04 Thread via cfe-commits

github-actions[bot] wrote:




:warning: C/C++ code formatter, clang-format found issues in your code. 
:warning:



You can test this locally with the following command:


``bash
git-clang-format --diff e6bd68c90b1a386e276f53ba28fdfdfda48bcea1 
a3b7e136e2f045a1c9948b679da89ec9a406516e -- 
clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_ld1_single.c 
clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_loads.c 
clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_st1_single.c 
clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_store.c 
clang/lib/CodeGen/CGBuiltin.cpp llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp 
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp 
llvm/lib/Target/AArch64/AArch64ISelLowering.h
``





View the diff from clang-format here.


``diff
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp 
b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index e041c78b1e03..02e3d3b5cece 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -10293,7 +10293,7 @@ AArch64TargetLowering::getConstraintType(StringRef 
Constraint) const {
 }
   } else if (parsePredicateConstraint(Constraint) !=
  PredicateConstraint::Invalid)
-  return C_RegisterClass;
+return C_RegisterClass;
   else if (parseConstraintCode(Constraint) != AArch64CC::Invalid)
 return C_Other;
   return TargetLowering::getConstraintType(Constraint);

``




https://github.com/llvm/llvm-project/pull/70474
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)

2023-11-04 Thread Momchil Velikov via cfe-commits


@@ -9671,28 +9677,47 @@ Value *CodeGenFunction::EmitSVEMaskedLoad(const 
CallExpr *E,
   // The vector type that is returned may be different from the
   // eventual type loaded from memory.
   auto VectorTy = cast(ReturnTy);
-  auto MemoryTy = llvm::ScalableVectorType::get(MemEltTy, VectorTy);
+  llvm::ScalableVectorType *MemoryTy = nullptr;
+  llvm::ScalableVectorType *PredTy = nullptr;
+  bool IsExtendingLoad = true;
+  switch (IntrinsicID) {
+  case Intrinsic::aarch64_sve_ld1uwq:
+  case Intrinsic::aarch64_sve_ld1udq:
+MemoryTy = llvm::ScalableVectorType::get(MemEltTy, 1);
+PredTy =
+llvm::ScalableVectorType::get(IntegerType::get(getLLVMContext(), 1), 
1);

momchil-velikov wrote:

One always falls through the cracks. Will fix it eventually.

https://github.com/llvm/llvm-project/pull/70474
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)

2023-11-04 Thread Momchil Velikov via cfe-commits


@@ -2614,6 +2619,37 @@ def int_aarch64_sve_ld1_pn_x4 : 
SVE2p1_Load_PN_X4_Intrinsic;
 def int_aarch64_sve_ldnt1_pn_x2 : SVE2p1_Load_PN_X2_Intrinsic;
 def int_aarch64_sve_ldnt1_pn_x4 : SVE2p1_Load_PN_X4_Intrinsic;
 
+//
+// SVE2.1 - Contiguous loads to quadword (single vector)
+//
+
+class SVE2p1_Single_Load_Quadword
+: DefaultAttrsIntrinsic<[llvm_anyvector_ty],
+[llvm_nxv1i1_ty, llvm_ptr_ty],
+[IntrReadMem]>;
+def int_aarch64_sve_ld1uwq : SVE2p1_Single_Load_Quadword;
+def int_aarch64_sve_ld1udq : SVE2p1_Single_Load_Quadword;
+
+//
+// SVE2.1 - Contiguous store from quadword (single vector)
+//
+
+class SVE2p1_Single_Store_Quadword
+: DefaultAttrsIntrinsic<[],
+[llvm_anyvector_ty, llvm_nxv1i1_ty, llvm_ptr_ty],
+[IntrArgMemOnly]>;

momchil-velikov wrote:

Done

https://github.com/llvm/llvm-project/pull/70474
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)

2023-11-04 Thread Momchil Velikov via cfe-commits


@@ -9702,17 +9727,34 @@ Value *CodeGenFunction::EmitSVEMaskedStore(const 
CallExpr *E,
   auto VectorTy = cast(Ops.back()->getType());
   auto MemoryTy = llvm::ScalableVectorType::get(MemEltTy, VectorTy);
 
-  Value *Predicate = EmitSVEPredicateCast(Ops[0], MemoryTy);
+  auto PredTy = MemoryTy;
+  auto AddrMemoryTy = MemoryTy;
+  bool IsTruncatingStore = true;

momchil-velikov wrote:

Done

https://github.com/llvm/llvm-project/pull/70474
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)

2023-11-04 Thread Momchil Velikov via cfe-commits


@@ -2614,6 +2619,37 @@ def int_aarch64_sve_ld1_pn_x4 : 
SVE2p1_Load_PN_X4_Intrinsic;
 def int_aarch64_sve_ldnt1_pn_x2 : SVE2p1_Load_PN_X2_Intrinsic;
 def int_aarch64_sve_ldnt1_pn_x4 : SVE2p1_Load_PN_X4_Intrinsic;
 
+//
+// SVE2.1 - Contiguous loads to quadword (single vector)
+//
+
+class SVE2p1_Single_Load_Quadword
+: DefaultAttrsIntrinsic<[llvm_anyvector_ty],
+[llvm_nxv1i1_ty, llvm_ptr_ty],
+[IntrReadMem]>;

momchil-velikov wrote:

Done

https://github.com/llvm/llvm-project/pull/70474
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)

2023-11-04 Thread Momchil Velikov via cfe-commits


@@ -9671,28 +9677,47 @@ Value *CodeGenFunction::EmitSVEMaskedLoad(const 
CallExpr *E,
   // The vector type that is returned may be different from the
   // eventual type loaded from memory.
   auto VectorTy = cast(ReturnTy);
-  auto MemoryTy = llvm::ScalableVectorType::get(MemEltTy, VectorTy);
+  llvm::ScalableVectorType *MemoryTy = nullptr;
+  llvm::ScalableVectorType *PredTy = nullptr;
+  bool IsExtendingLoad = true;

momchil-velikov wrote:

Done

https://github.com/llvm/llvm-project/pull/70474
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)

2023-11-03 Thread David Sherwood via cfe-commits


@@ -2614,6 +2619,37 @@ def int_aarch64_sve_ld1_pn_x4 : 
SVE2p1_Load_PN_X4_Intrinsic;
 def int_aarch64_sve_ldnt1_pn_x2 : SVE2p1_Load_PN_X2_Intrinsic;
 def int_aarch64_sve_ldnt1_pn_x4 : SVE2p1_Load_PN_X4_Intrinsic;
 
+//
+// SVE2.1 - Contiguous loads to quadword (single vector)
+//
+
+class SVE2p1_Single_Load_Quadword
+: DefaultAttrsIntrinsic<[llvm_anyvector_ty],
+[llvm_nxv1i1_ty, llvm_ptr_ty],
+[IntrReadMem]>;
+def int_aarch64_sve_ld1uwq : SVE2p1_Single_Load_Quadword;
+def int_aarch64_sve_ld1udq : SVE2p1_Single_Load_Quadword;
+
+//
+// SVE2.1 - Contiguous store from quadword (single vector)
+//
+
+class SVE2p1_Single_Store_Quadword
+: DefaultAttrsIntrinsic<[],
+[llvm_anyvector_ty, llvm_nxv1i1_ty, llvm_ptr_ty],
+[IntrArgMemOnly]>;

david-arm wrote:

This also needs the IntrWriteMem flag otherwise we could end up incorrectly 
rescheduling stores in the wrong place.

https://github.com/llvm/llvm-project/pull/70474
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)

2023-11-03 Thread David Sherwood via cfe-commits


@@ -9671,28 +9677,47 @@ Value *CodeGenFunction::EmitSVEMaskedLoad(const 
CallExpr *E,
   // The vector type that is returned may be different from the
   // eventual type loaded from memory.
   auto VectorTy = cast(ReturnTy);
-  auto MemoryTy = llvm::ScalableVectorType::get(MemEltTy, VectorTy);
+  llvm::ScalableVectorType *MemoryTy = nullptr;
+  llvm::ScalableVectorType *PredTy = nullptr;
+  bool IsExtendingLoad = true;
+  switch (IntrinsicID) {
+  case Intrinsic::aarch64_sve_ld1uwq:
+  case Intrinsic::aarch64_sve_ld1udq:
+MemoryTy = llvm::ScalableVectorType::get(MemEltTy, 1);
+PredTy =
+llvm::ScalableVectorType::get(IntegerType::get(getLLVMContext(), 1), 
1);

david-arm wrote:

You can just do 
llvm::ScalableVectorType::get(Type::getInt1Ty(getLLVMContext()), 1);

https://github.com/llvm/llvm-project/pull/70474
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)

2023-11-03 Thread David Sherwood via cfe-commits


@@ -9702,17 +9727,34 @@ Value *CodeGenFunction::EmitSVEMaskedStore(const 
CallExpr *E,
   auto VectorTy = cast(Ops.back()->getType());
   auto MemoryTy = llvm::ScalableVectorType::get(MemEltTy, VectorTy);
 
-  Value *Predicate = EmitSVEPredicateCast(Ops[0], MemoryTy);
+  auto PredTy = MemoryTy;
+  auto AddrMemoryTy = MemoryTy;
+  bool IsTruncatingStore = true;
+  ;

david-arm wrote:

Extra ; here

https://github.com/llvm/llvm-project/pull/70474
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)

2023-11-03 Thread David Sherwood via cfe-commits


@@ -9671,28 +9677,47 @@ Value *CodeGenFunction::EmitSVEMaskedLoad(const 
CallExpr *E,
   // The vector type that is returned may be different from the
   // eventual type loaded from memory.
   auto VectorTy = cast(ReturnTy);
-  auto MemoryTy = llvm::ScalableVectorType::get(MemEltTy, VectorTy);
+  llvm::ScalableVectorType *MemoryTy = nullptr;
+  llvm::ScalableVectorType *PredTy = nullptr;
+  bool IsExtendingLoad = true;

david-arm wrote:

I personally think using this variable is misleading because aarch64_sve_ld1uwq 
is actually an extending load - we're extending from 32-bit memory elements to 
128-bit integer elements. So it looks odd when we set this to false. Perhaps 
it's better to just explicitly have a variable called `IsQuadLoad` and use that 
instead rather than try to generalise this. The quad-word loads are a really 
just an exception here because we're working around the lack of a  type. So you'd have something like

  case Intrinsic::aarch64_sve_ld1uwq:
IsQuadLoad = true;
...
  default:
IsQuadLoad = false;


  Function *F =
  CGM.getIntrinsic(IntrinsicID, IsQuadLoad ? VectorTy : MemoryTy);\

...

  if (IsQuadLoad)
return Load;

https://github.com/llvm/llvm-project/pull/70474
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits