[PATCH] D153128: [AArch64][RCPC3] Add Neon intrinsics for LDAP1 and STL2

2023-06-19 Thread Tomas Matheson via Phabricator via cfe-commits
tmatheson accepted this revision.
tmatheson added a comment.
This revision is now accepted and ready to land.

LGTM. ACLE PR here: https://github.com/ARM-software/acle/pull/265




Comment at: clang/lib/CodeGen/CGBuiltin.cpp:6769
+  // and vstl1(q)_lane, but codegen is equivalent for all of them. Choose an
+  // arbitrary one to be handled as tha canonical variation.
+  { NEON::BI__builtin_neon_vldap1_lane_u64, 
NEON::BI__builtin_neon_vldap1_lane_s64 },




Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153128/new/

https://reviews.llvm.org/D153128

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153128: [AArch64][RCPC3] Add Neon intrinsics for LDAP1 and STL2

2023-06-16 Thread Lucas Prates via Phabricator via cfe-commits
pratlucas created this revision.
pratlucas added reviewers: tmatheson, vhscampos, dmgreen.
Herald added a subscriber: kristof.beyls.
Herald added a project: All.
pratlucas requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

This adds new intrisics to support the LDAP1 and STL1 Advanced SIMD
(Neon) instructions introduced as part of FEAT_LRCPC3.
The new intrinsics `vldap1(q)_lane`/`vstl1(q)_lane` generate IR code
similar to the existing `vld1(q)_lane/st1(q)_lane` ones, but capturing
the difference in the atomic release/acquire memory model.

The LLVM code generation changes to ensure that this instruction pair
is lowered to the correct LDAP1/STL1 instructions will be covered in a
separate commit.

Based on a patch by Sam Elliott.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D153128

Files:
  clang/include/clang/Basic/arm_neon.td
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/test/CodeGen/aarch64-neon-ldst-one-rcpc3.c
  clang/utils/TableGen/NeonEmitter.cpp

Index: clang/utils/TableGen/NeonEmitter.cpp
===
--- clang/utils/TableGen/NeonEmitter.cpp
+++ clang/utils/TableGen/NeonEmitter.cpp
@@ -2086,12 +2086,13 @@
 
 std::string Name = Def->getName();
 // Omit type checking for the pointer arguments of vld1_lane, vld1_dup,
-// and vst1_lane intrinsics.  Using a pointer to the vector element
-// type with one of those operations causes codegen to select an aligned
-// load/store instruction.  If you want an unaligned operation,
-// the pointer argument needs to have less alignment than element type,
-// so just accept any pointer type.
-if (Name == "vld1_lane" || Name == "vld1_dup" || Name == "vst1_lane") {
+// vst1_lane, vldap1_lane, and vstl1_lane intrinsics.  Using a pointer to
+// the vector element type with one of those operations causes codegen to
+// select an aligned load/store instruction.  If you want an unaligned
+// operation, the pointer argument needs to have less alignment than element
+// type, so just accept any pointer type.
+if (Name == "vld1_lane" || Name == "vld1_dup" || Name == "vst1_lane" ||
+Name == "vldap1_lane" || Name == "vstl1_lane") {
   PtrArgNum = -1;
   HasConstPtr = false;
 }
Index: clang/test/CodeGen/aarch64-neon-ldst-one-rcpc3.c
===
--- /dev/null
+++ clang/test/CodeGen/aarch64-neon-ldst-one-rcpc3.c
@@ -0,0 +1,201 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// RUN: %clang_cc1 -triple aarch64-arm-none-eabi -target-feature +neon \
+// RUN:  -target-feature +rcpc3 -disable-O0-optnone -emit-llvm -o - %s \
+// RUN: | opt -S -passes=mem2reg | FileCheck %s
+
+// REQUIRES: aarch64-registered-target
+
+#include 
+
+
+// CHECK-LABEL: @test_vldap1q_lane_u64(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[TMP0:%.*]] = bitcast <2 x i64> [[B:%.*]] to <16 x i8>
+// CHECK-NEXT:[[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <2 x i64>
+// CHECK-NEXT:[[TMP2:%.*]] = load atomic i64, ptr [[A:%.*]] acquire, align 8
+// CHECK-NEXT:[[VLDAP1_LANE:%.*]] = insertelement <2 x i64> [[TMP1]], i64 [[TMP2]], i32 1
+// CHECK-NEXT:ret <2 x i64> [[VLDAP1_LANE]]
+//
+uint64x2_t test_vldap1q_lane_u64(uint64_t  *a, uint64x2_t b) {
+  return vldap1q_lane_u64(a, b, 1);
+}
+
+// CHECK-LABEL: @test_vldap1q_lane_s64(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[TMP0:%.*]] = bitcast <2 x i64> [[B:%.*]] to <16 x i8>
+// CHECK-NEXT:[[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <2 x i64>
+// CHECK-NEXT:[[TMP2:%.*]] = load atomic i64, ptr [[A:%.*]] acquire, align 8
+// CHECK-NEXT:[[VLDAP1_LANE:%.*]] = insertelement <2 x i64> [[TMP1]], i64 [[TMP2]], i32 1
+// CHECK-NEXT:ret <2 x i64> [[VLDAP1_LANE]]
+//
+int64x2_t test_vldap1q_lane_s64(int64_t  *a, int64x2_t b) {
+  return vldap1q_lane_s64(a, b, 1);
+}
+
+// CHECK-LABEL: @test_vldap1q_lane_f64(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[TMP0:%.*]] = bitcast <2 x double> [[B:%.*]] to <16 x i8>
+// CHECK-NEXT:[[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <2 x double>
+// CHECK-NEXT:[[TMP2:%.*]] = load atomic double, ptr [[A:%.*]] acquire, align 8
+// CHECK-NEXT:[[VLDAP1_LANE:%.*]] = insertelement <2 x double> [[TMP1]], double [[TMP2]], i32 1
+// CHECK-NEXT:ret <2 x double> [[VLDAP1_LANE]]
+//
+float64x2_t test_vldap1q_lane_f64(float64_t  *a, float64x2_t b) {
+  return vldap1q_lane_f64(a, b, 1);
+}
+
+// CHECK-LABEL: @test_vldap1q_lane_p64(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[TMP0:%.*]] = bitcast <2 x i64> [[B:%.*]] to <16 x i8>
+// CHECK-NEXT:[[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <2 x i64>
+// CHECK-NEXT:[[TMP2:%.*]] = load atomic i64, ptr [[A:%.*]] acquire, align 8
+// CHECK-NEXT:[[VLDAP1_LANE:%.*]] = insertelement <2 x i64> [[TMP1]], i64 [[TMP2]], i32 1
+// CHECK-NEXT:ret <2 x i64>