[llvm] [clang] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)
https://github.com/momchil-velikov edited https://github.com/llvm/llvm-project/pull/70474 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)
@@ -0,0 +1,2503 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py +// REQUIRES: aarch64-registered-target +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve2p1 -target-feature +bf16 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve2p1 -target-feature +bf16 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK +// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve2p1 -target-feature +bf16 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s +// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve2p1 -target-feature +bf16 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve2p1 -target-feature +bf16 -S -disable-O0-optnone -Werror -Wall -o /dev/null %s + +#include + +#ifdef SVE_OVERLOADED_FORMS +// A simple used,unused... macro, long enough to represent any SVE builtin. +#define SVE_ACLE_FUNC(A1,A2_UNUSED,A3,A4_UNUSED) A1##A3 +#else +#define SVE_ACLE_FUNC(A1,A2,A3,A4) A1##A2##A3##A4 +#endif + +// CHECK-LABEL: @test_svld2q_u8( +// CHECK-NEXT: entry: +// CHECK-NEXT:[[TMP0:%.*]] = tail call { , } @llvm.aarch64.sve.ld2q.sret.nxv16i8( [[PG:%.*]], ptr [[BASE:%.*]]) +// CHECK-NEXT:[[TMP1:%.*]] = extractvalue { , } [[TMP0]], 0 +// CHECK-NEXT:[[TMP2:%.*]] = tail call @llvm.vector.insert.nxv32i8.nxv16i8( poison, [[TMP1]], i64 0) +// CHECK-NEXT:[[TMP3:%.*]] = extractvalue { , } [[TMP0]], 1 +// CHECK-NEXT:[[TMP4:%.*]] = tail call @llvm.vector.insert.nxv32i8.nxv16i8( [[TMP2]], [[TMP3]], i64 16) +// CHECK-NEXT:ret [[TMP4]] +// +// CPP-CHECK-LABEL: @_Z14test_svld2q_u8u10__SVBool_tPKh( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT:[[TMP0:%.*]] = tail call { , } @llvm.aarch64.sve.ld2q.sret.nxv16i8( [[PG:%.*]], ptr [[BASE:%.*]]) +// CPP-CHECK-NEXT:[[TMP1:%.*]] = extractvalue { , } [[TMP0]], 0 +// CPP-CHECK-NEXT:[[TMP2:%.*]] = tail call @llvm.vector.insert.nxv32i8.nxv16i8( poison, [[TMP1]], i64 0) +// CPP-CHECK-NEXT:[[TMP3:%.*]] = extractvalue { , } [[TMP0]], 1 +// CPP-CHECK-NEXT:[[TMP4:%.*]] = tail call @llvm.vector.insert.nxv32i8.nxv16i8( [[TMP2]], [[TMP3]], i64 16) +// CPP-CHECK-NEXT:ret [[TMP4]] +// +svuint8x2_t test_svld2q_u8(svbool_t pg, const uint8_t *base) +{ + return SVE_ACLE_FUNC(svld2q,,_u8,)(pg, base); +} + +// CHECK-LABEL: @test_svld2q_s8( +// CHECK-NEXT: entry: +// CHECK-NEXT:[[TMP0:%.*]] = tail call { , } @llvm.aarch64.sve.ld2q.sret.nxv16i8( [[PG:%.*]], ptr [[BASE:%.*]]) +// CHECK-NEXT:[[TMP1:%.*]] = extractvalue { , } [[TMP0]], 0 +// CHECK-NEXT:[[TMP2:%.*]] = tail call @llvm.vector.insert.nxv32i8.nxv16i8( poison, [[TMP1]], i64 0) +// CHECK-NEXT:[[TMP3:%.*]] = extractvalue { , } [[TMP0]], 1 +// CHECK-NEXT:[[TMP4:%.*]] = tail call @llvm.vector.insert.nxv32i8.nxv16i8( [[TMP2]], [[TMP3]], i64 16) +// CHECK-NEXT:ret [[TMP4]] +// +// CPP-CHECK-LABEL: @_Z14test_svld2q_s8u10__SVBool_tPKa( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT:[[TMP0:%.*]] = tail call { , } @llvm.aarch64.sve.ld2q.sret.nxv16i8( [[PG:%.*]], ptr [[BASE:%.*]]) +// CPP-CHECK-NEXT:[[TMP1:%.*]] = extractvalue { , } [[TMP0]], 0 +// CPP-CHECK-NEXT:[[TMP2:%.*]] = tail call @llvm.vector.insert.nxv32i8.nxv16i8( poison, [[TMP1]], i64 0) +// CPP-CHECK-NEXT:[[TMP3:%.*]] = extractvalue { , } [[TMP0]], 1 +// CPP-CHECK-NEXT:[[TMP4:%.*]] = tail call @llvm.vector.insert.nxv32i8.nxv16i8( [[TMP2]], [[TMP3]], i64 16) +// CPP-CHECK-NEXT:ret [[TMP4]] +// +svint8x2_t test_svld2q_s8(svbool_t pg, const int8_t *base) +{ + return SVE_ACLE_FUNC(svld2q,,_s8,)(pg, base); +} +// CHECK-LABEL: @test_svld2q_u16( +// CHECK-NEXT: entry: +// CHECK-NEXT:[[TMP0:%.*]] = tail call @llvm.aarch64.sve.convert.from.svbool.nxv8i1( [[PG:%.*]]) +// CHECK-NEXT:[[TMP1:%.*]] = tail call { , } @llvm.aarch64.sve.ld2q.sret.nxv8i16( [[TMP0]], ptr [[BASE:%.*]]) +// CHECK-NEXT:[[TMP2:%.*]] = extractvalue { , } [[TMP1]], 0 +// CHECK-NEXT:[[TMP3:%.*]] = tail call @llvm.vector.insert.nxv16i16.nxv8i16( poison, [[TMP2]], i64 0) +// CHECK-NEXT:[[TMP4:%.*]] = extractvalue { , } [[TMP1]], 1 +// CHECK-NEXT:[[TMP5:%.*]] = tail call @llvm.vector.insert.nxv16i16.nxv8i16( [[TMP3]], [[TMP4]], i64 8) +// CHECK-NEXT:ret [[TMP5]] +// +// CPP-CHECK-LABEL: @_Z15test_svld2q_u16u10__SVBool_tPKt( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT:[[TMP0:%.*]] = tail call @llvm.aarch64.sve.convert.from.svbool.nxv8i1(
[llvm] [clang] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)
https://github.com/david-arm edited https://github.com/llvm/llvm-project/pull/70474 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)
https://github.com/david-arm commented: Wow, this is a huge patch. :) It took me a few hours to work through all the tests, and it's quite possible I've missed something. However, overall it looks fine and I can't see any major issues. I think there is one missing test, but once that's fixed I'll approve it! https://github.com/llvm/llvm-project/pull/70474 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)
github-actions[bot] wrote: :warning: C/C++ code formatter, clang-format found issues in your code. :warning: You can test this locally with the following command: ``bash git-clang-format --diff e6bd68c90b1a386e276f53ba28fdfdfda48bcea1 a3b7e136e2f045a1c9948b679da89ec9a406516e -- clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_ld1_single.c clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_loads.c clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_st1_single.c clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_store.c clang/lib/CodeGen/CGBuiltin.cpp llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp llvm/lib/Target/AArch64/AArch64ISelLowering.cpp llvm/lib/Target/AArch64/AArch64ISelLowering.h `` View the diff from clang-format here. ``diff diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp index e041c78b1e03..02e3d3b5cece 100644 --- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp +++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp @@ -10293,7 +10293,7 @@ AArch64TargetLowering::getConstraintType(StringRef Constraint) const { } } else if (parsePredicateConstraint(Constraint) != PredicateConstraint::Invalid) - return C_RegisterClass; +return C_RegisterClass; else if (parseConstraintCode(Constraint) != AArch64CC::Invalid) return C_Other; return TargetLowering::getConstraintType(Constraint); `` https://github.com/llvm/llvm-project/pull/70474 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)
@@ -9671,28 +9677,47 @@ Value *CodeGenFunction::EmitSVEMaskedLoad(const CallExpr *E, // The vector type that is returned may be different from the // eventual type loaded from memory. auto VectorTy = cast(ReturnTy); - auto MemoryTy = llvm::ScalableVectorType::get(MemEltTy, VectorTy); + llvm::ScalableVectorType *MemoryTy = nullptr; + llvm::ScalableVectorType *PredTy = nullptr; + bool IsExtendingLoad = true; + switch (IntrinsicID) { + case Intrinsic::aarch64_sve_ld1uwq: + case Intrinsic::aarch64_sve_ld1udq: +MemoryTy = llvm::ScalableVectorType::get(MemEltTy, 1); +PredTy = +llvm::ScalableVectorType::get(IntegerType::get(getLLVMContext(), 1), 1); momchil-velikov wrote: One always falls through the cracks. Will fix it eventually. https://github.com/llvm/llvm-project/pull/70474 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)
@@ -2614,6 +2619,37 @@ def int_aarch64_sve_ld1_pn_x4 : SVE2p1_Load_PN_X4_Intrinsic; def int_aarch64_sve_ldnt1_pn_x2 : SVE2p1_Load_PN_X2_Intrinsic; def int_aarch64_sve_ldnt1_pn_x4 : SVE2p1_Load_PN_X4_Intrinsic; +// +// SVE2.1 - Contiguous loads to quadword (single vector) +// + +class SVE2p1_Single_Load_Quadword +: DefaultAttrsIntrinsic<[llvm_anyvector_ty], +[llvm_nxv1i1_ty, llvm_ptr_ty], +[IntrReadMem]>; +def int_aarch64_sve_ld1uwq : SVE2p1_Single_Load_Quadword; +def int_aarch64_sve_ld1udq : SVE2p1_Single_Load_Quadword; + +// +// SVE2.1 - Contiguous store from quadword (single vector) +// + +class SVE2p1_Single_Store_Quadword +: DefaultAttrsIntrinsic<[], +[llvm_anyvector_ty, llvm_nxv1i1_ty, llvm_ptr_ty], +[IntrArgMemOnly]>; momchil-velikov wrote: Done https://github.com/llvm/llvm-project/pull/70474 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)
@@ -9702,17 +9727,34 @@ Value *CodeGenFunction::EmitSVEMaskedStore(const CallExpr *E, auto VectorTy = cast(Ops.back()->getType()); auto MemoryTy = llvm::ScalableVectorType::get(MemEltTy, VectorTy); - Value *Predicate = EmitSVEPredicateCast(Ops[0], MemoryTy); + auto PredTy = MemoryTy; + auto AddrMemoryTy = MemoryTy; + bool IsTruncatingStore = true; momchil-velikov wrote: Done https://github.com/llvm/llvm-project/pull/70474 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)
@@ -2614,6 +2619,37 @@ def int_aarch64_sve_ld1_pn_x4 : SVE2p1_Load_PN_X4_Intrinsic; def int_aarch64_sve_ldnt1_pn_x2 : SVE2p1_Load_PN_X2_Intrinsic; def int_aarch64_sve_ldnt1_pn_x4 : SVE2p1_Load_PN_X4_Intrinsic; +// +// SVE2.1 - Contiguous loads to quadword (single vector) +// + +class SVE2p1_Single_Load_Quadword +: DefaultAttrsIntrinsic<[llvm_anyvector_ty], +[llvm_nxv1i1_ty, llvm_ptr_ty], +[IntrReadMem]>; momchil-velikov wrote: Done https://github.com/llvm/llvm-project/pull/70474 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)
@@ -9671,28 +9677,47 @@ Value *CodeGenFunction::EmitSVEMaskedLoad(const CallExpr *E, // The vector type that is returned may be different from the // eventual type loaded from memory. auto VectorTy = cast(ReturnTy); - auto MemoryTy = llvm::ScalableVectorType::get(MemEltTy, VectorTy); + llvm::ScalableVectorType *MemoryTy = nullptr; + llvm::ScalableVectorType *PredTy = nullptr; + bool IsExtendingLoad = true; momchil-velikov wrote: Done https://github.com/llvm/llvm-project/pull/70474 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)
@@ -2614,6 +2619,37 @@ def int_aarch64_sve_ld1_pn_x4 : SVE2p1_Load_PN_X4_Intrinsic; def int_aarch64_sve_ldnt1_pn_x2 : SVE2p1_Load_PN_X2_Intrinsic; def int_aarch64_sve_ldnt1_pn_x4 : SVE2p1_Load_PN_X4_Intrinsic; +// +// SVE2.1 - Contiguous loads to quadword (single vector) +// + +class SVE2p1_Single_Load_Quadword +: DefaultAttrsIntrinsic<[llvm_anyvector_ty], +[llvm_nxv1i1_ty, llvm_ptr_ty], +[IntrReadMem]>; +def int_aarch64_sve_ld1uwq : SVE2p1_Single_Load_Quadword; +def int_aarch64_sve_ld1udq : SVE2p1_Single_Load_Quadword; + +// +// SVE2.1 - Contiguous store from quadword (single vector) +// + +class SVE2p1_Single_Store_Quadword +: DefaultAttrsIntrinsic<[], +[llvm_anyvector_ty, llvm_nxv1i1_ty, llvm_ptr_ty], +[IntrArgMemOnly]>; david-arm wrote: This also needs the IntrWriteMem flag otherwise we could end up incorrectly rescheduling stores in the wrong place. https://github.com/llvm/llvm-project/pull/70474 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)
@@ -9671,28 +9677,47 @@ Value *CodeGenFunction::EmitSVEMaskedLoad(const CallExpr *E, // The vector type that is returned may be different from the // eventual type loaded from memory. auto VectorTy = cast(ReturnTy); - auto MemoryTy = llvm::ScalableVectorType::get(MemEltTy, VectorTy); + llvm::ScalableVectorType *MemoryTy = nullptr; + llvm::ScalableVectorType *PredTy = nullptr; + bool IsExtendingLoad = true; + switch (IntrinsicID) { + case Intrinsic::aarch64_sve_ld1uwq: + case Intrinsic::aarch64_sve_ld1udq: +MemoryTy = llvm::ScalableVectorType::get(MemEltTy, 1); +PredTy = +llvm::ScalableVectorType::get(IntegerType::get(getLLVMContext(), 1), 1); david-arm wrote: You can just do llvm::ScalableVectorType::get(Type::getInt1Ty(getLLVMContext()), 1); https://github.com/llvm/llvm-project/pull/70474 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)
@@ -9702,17 +9727,34 @@ Value *CodeGenFunction::EmitSVEMaskedStore(const CallExpr *E, auto VectorTy = cast(Ops.back()->getType()); auto MemoryTy = llvm::ScalableVectorType::get(MemEltTy, VectorTy); - Value *Predicate = EmitSVEPredicateCast(Ops[0], MemoryTy); + auto PredTy = MemoryTy; + auto AddrMemoryTy = MemoryTy; + bool IsTruncatingStore = true; + ; david-arm wrote: Extra ; here https://github.com/llvm/llvm-project/pull/70474 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (PR #70474)
@@ -9671,28 +9677,47 @@ Value *CodeGenFunction::EmitSVEMaskedLoad(const CallExpr *E, // The vector type that is returned may be different from the // eventual type loaded from memory. auto VectorTy = cast(ReturnTy); - auto MemoryTy = llvm::ScalableVectorType::get(MemEltTy, VectorTy); + llvm::ScalableVectorType *MemoryTy = nullptr; + llvm::ScalableVectorType *PredTy = nullptr; + bool IsExtendingLoad = true; david-arm wrote: I personally think using this variable is misleading because aarch64_sve_ld1uwq is actually an extending load - we're extending from 32-bit memory elements to 128-bit integer elements. So it looks odd when we set this to false. Perhaps it's better to just explicitly have a variable called `IsQuadLoad` and use that instead rather than try to generalise this. The quad-word loads are a really just an exception here because we're working around the lack of a type. So you'd have something like case Intrinsic::aarch64_sve_ld1uwq: IsQuadLoad = true; ... default: IsQuadLoad = false; Function *F = CGM.getIntrinsic(IntrinsicID, IsQuadLoad ? VectorTy : MemoryTy);\ ... if (IsQuadLoad) return Load; https://github.com/llvm/llvm-project/pull/70474 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits