[llvm-branch-commits] [llvm] [AArch64][ISel] Select constructive SVE2 ext instruction (PR #151730)

2025-08-05 Thread Gaëtan Bossu via llvm-branch-commits

https://github.com/gbossu closed 
https://github.com/llvm/llvm-project/pull/151730
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][ISel] Select constructive SVE2 ext instruction (PR #151730)

2025-08-05 Thread Gaëtan Bossu via llvm-branch-commits

gbossu wrote:

Closing, I'll work on supporting `movprfx` for `ext` instead.

https://github.com/llvm/llvm-project/pull/151730
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][ISel] Select constructive SVE2 ext instruction (PR #151730)

2025-08-01 Thread Gaëtan Bossu via llvm-branch-commits


@@ -0,0 +1,253 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mattr=+sve  -verify-machineinstrs < %s | FileCheck %s 
--check-prefixes=SVE
+; RUN: llc -mattr=+sve2 -verify-machineinstrs < %s | FileCheck %s 
--check-prefixes=SVE2
+
+target triple = "aarch64-unknown-linux-gnu"
+
+; Test vector_splice patterns.
+; Note that this test is similar to named-vector-shuffles-sve.ll, but it 
focuses
+; on testing all supported types, and a positive "splice index".
+
+
+; i8 elements
+define  @splice_nxv16i8( %a,  %b) {
+; SVE-LABEL: splice_nxv16i8:
+; SVE:   // %bb.0:
+; SVE-NEXT:ext z0.b, z0.b, z1.b, #1
+; SVE-NEXT:ret
+;
+; SVE2-LABEL: splice_nxv16i8:
+; SVE2:   // %bb.0:
+; SVE2-NEXT:// kill: def $z1 killed $z1 killed $z0_z1 def $z0_z1
+; SVE2-NEXT:// kill: def $z0 killed $z0 killed $z0_z1 def $z0_z1
+; SVE2-NEXT:ext z0.b, { z0.b, z1.b }, #1
+; SVE2-NEXT:ret
+  %res = call  @llvm.vector.splice.nxv16i8( %a,  %b, i32 1)
+  ret  %res
+}
+
+; i16 elements
+define  @splice_nxv8i16( %a,  %b) {
+; SVE-LABEL: splice_nxv8i16:
+; SVE:   // %bb.0:
+; SVE-NEXT:ext z0.b, z0.b, z1.b, #2
+; SVE-NEXT:ret
+;
+; SVE2-LABEL: splice_nxv8i16:
+; SVE2:   // %bb.0:
+; SVE2-NEXT:// kill: def $z1 killed $z1 killed $z0_z1 def $z0_z1
+; SVE2-NEXT:// kill: def $z0 killed $z0 killed $z0_z1 def $z0_z1
+; SVE2-NEXT:ext z0.b, { z0.b, z1.b }, #2
+; SVE2-NEXT:ret
+  %res = call  @llvm.vector.splice.nxv8i16( %a,  %b, i32 1)
+  ret  %res
+}
+
+; bf16 elements
+
+define  @splice_nxv8bfloat( %a, 
 %b) {
+; SVE-LABEL: splice_nxv8bfloat:
+; SVE:   // %bb.0:
+; SVE-NEXT:ext z0.b, z0.b, z1.b, #2
+; SVE-NEXT:ret
+;
+; SVE2-LABEL: splice_nxv8bfloat:
+; SVE2:   // %bb.0:
+; SVE2-NEXT:// kill: def $z1 killed $z1 killed $z0_z1 def $z0_z1
+; SVE2-NEXT:// kill: def $z0 killed $z0 killed $z0_z1 def $z0_z1
+; SVE2-NEXT:ext z0.b, { z0.b, z1.b }, #2
+; SVE2-NEXT:ret
+  %res = call  @llvm.vector.splice.nxv8bfloat( %a,  %b, i32 1)
+  ret  %res
+}
+
+define  @splice_nxv4bfloat( %a, 
 %b) {
+; SVE-LABEL: splice_nxv4bfloat:
+; SVE:   // %bb.0:
+; SVE-NEXT:ext z0.b, z0.b, z1.b, #4
+; SVE-NEXT:ret
+;
+; SVE2-LABEL: splice_nxv4bfloat:
+; SVE2:   // %bb.0:
+; SVE2-NEXT:// kill: def $z1 killed $z1 killed $z0_z1 def $z0_z1
+; SVE2-NEXT:// kill: def $z0 killed $z0 killed $z0_z1 def $z0_z1
+; SVE2-NEXT:ext z0.b, { z0.b, z1.b }, #4
+; SVE2-NEXT:ret
+  %res = call  @llvm.vector.splice.nxv4bfloat( %a,  %b, i32 1)

gbossu wrote:

⚠️  I have included the same type support for `EXT_ZZI_B` as we have for the 
destructive `EXT_ZZI`, i.e. support for these "weird" types where the fixed 
part isn't 128-bit:
 - 
 - 
 - 
 - 
 - 

I'm not sure why they were here in the first place, and looking at the 
generated code, I think the patterns are wrong. Maybe I should just remove 
those types altogether?

https://github.com/llvm/llvm-project/pull/151730
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][ISel] Select constructive SVE2 ext instruction (PR #151730)

2025-08-01 Thread Gaëtan Bossu via llvm-branch-commits


@@ -109,14 +109,13 @@ define <16 x i16> @two_way_i8_i16_vl256(ptr %accptr, ptr 
%uptr, ptr %sptr) vscal
 ; SME-LABEL: two_way_i8_i16_vl256:
 ; SME:   // %bb.0:
 ; SME-NEXT:ldr z0, [x0]
-; SME-NEXT:ldr z1, [x1]
-; SME-NEXT:ldr z2, [x2]
-; SME-NEXT:umlalb z0.h, z2.b, z1.b
-; SME-NEXT:umlalt z0.h, z2.b, z1.b
-; SME-NEXT:mov z1.d, z0.d
-; SME-NEXT:ext z1.b, z1.b, z0.b, #16
-; SME-NEXT:// kill: def $q0 killed $q0 killed $z0
-; SME-NEXT:// kill: def $q1 killed $q1 killed $z1
+; SME-NEXT:ldr z2, [x1]
+; SME-NEXT:ldr z3, [x2]
+; SME-NEXT:umlalb z0.h, z3.b, z2.b
+; SME-NEXT:umlalt z0.h, z3.b, z2.b
+; SME-NEXT:ext z2.b, { z0.b, z1.b }, #16
+; SME-NEXT:// kill: def $q0 killed $q0 killed $z0_z1
+; SME-NEXT:mov z1.d, z2.d

gbossu wrote:

This is one example where we would gain by having subreg liveness.

Currently the `ret` instruction has an implicit use of `z0` and `z1` for ABI 
reasons. This forces a use of all aliasing registers, including `z0_z1`, which 
will be considered live from `umlalt z0.h, z3.b, z2.b`. As a consequence, `ext 
z2.b, { z0.b, z1.b }, #16` cannot be rewritten directly as `ext z1.b, { z0.b, 
z1.b }, #16` as it would create an interference. With subreg liveness enabled, 
we would see there is no interference for `z0_z1.hi`.

https://github.com/llvm/llvm-project/pull/151730
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][ISel] Extend vector_splice tests (NFC) (PR #152553)

2025-08-07 Thread Gaëtan Bossu via llvm-branch-commits

https://github.com/gbossu created 
https://github.com/llvm/llvm-project/pull/152553

They use extract shuffles for fixed vectors, and
llvm.vector.splice intrinsics for scalable vectors.

In the previous tests using ld+extract+st, the extract was optimized away and 
replaced by a smaller load at the right offset. This meant we didin't really 
test the vector_splice ISD node.

**This is a chained PR**

From a6be08b2dd026b6b3dcd7ca8ed5e231671a160b3 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ga=C3=ABtan=20Bossu?= 
Date: Wed, 6 Aug 2025 10:32:44 +
Subject: [PATCH] [AArch64][ISel] Extend vector_splice tests (NFC)

They use extract shuffles for fixed vectors, and
llvm.vector.splice intrinsics for scalable vectors.

In the previous tests using ld+extract+st, the extract was optimized
away and replaced by a smaller load at the right offset. This meant
we didin't really test the vector_splice ISD node.
---
 .../sve-fixed-length-extract-subvector.ll | 368 +-
 .../test/CodeGen/AArch64/sve-vector-splice.ll | 162 
 2 files changed, 526 insertions(+), 4 deletions(-)
 create mode 100644 llvm/test/CodeGen/AArch64/sve-vector-splice.ll

diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-extract-subvector.ll 
b/llvm/test/CodeGen/AArch64/sve-fixed-length-extract-subvector.ll
index 2dd3269a2..800f95d97af4c 100644
--- a/llvm/test/CodeGen/AArch64/sve-fixed-length-extract-subvector.ll
+++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-extract-subvector.ll
@@ -5,6 +5,12 @@
 
 target triple = "aarch64-unknown-linux-gnu"
 
+; Note that both the vector.extract intrinsics and SK_ExtractSubvector
+; shufflevector instructions get detected as a extract_subvector ISD node in
+; SelectionDAG. We'll test both cases for the sake of completeness, even though
+; vector.extract intrinsics should get lowered into shufflevector by the time 
we
+; reach the backend.
+
 ; i8
 
 ; Don't use SVE for 64-bit vectors.
@@ -40,6 +46,67 @@ define void @extract_subvector_v32i8(ptr %a, ptr %b) 
vscale_range(2,0) #0 {
   ret void
 }
 
+define void @extract_v32i8_halves(ptr %in, ptr %out, ptr %out2) #0 
vscale_range(2,2) {
+; CHECK-LABEL: extract_v32i8_halves:
+; CHECK:   // %bb.0: // %entry
+; CHECK-NEXT:ldr z0, [x0]
+; CHECK-NEXT:mov z1.d, z0.d
+; CHECK-NEXT:ext z1.b, z1.b, z0.b, #16
+; CHECK-NEXT:str q1, [x1]
+; CHECK-NEXT:str q0, [x2]
+; CHECK-NEXT:ret
+entry:
+  %b = load <32 x i8>, ptr %in
+  %hi = shufflevector <32 x i8> %b, <32 x i8> poison, <16 x i32> 
+  store <16 x i8> %hi, ptr %out
+  %lo = shufflevector <32 x i8> %b, <32 x i8> poison, <16 x i32> 
+  store <16 x i8> %lo, ptr %out2
+  ret void
+}
+
+define void @extract_v32i8_half_unaligned(ptr %in, ptr %out) #0 
vscale_range(2,2) {
+; CHECK-LABEL: extract_v32i8_half_unaligned:
+; CHECK:   // %bb.0: // %entry
+; CHECK-NEXT:ldr z0, [x0]
+; CHECK-NEXT:mov z1.d, z0.d
+; CHECK-NEXT:ext z1.b, z1.b, z0.b, #16
+; CHECK-NEXT:ext v0.16b, v0.16b, v1.16b, #4
+; CHECK-NEXT:str q0, [x1]
+; CHECK-NEXT:ret
+entry:
+  %b = load <32 x i8>, ptr %in
+  %d = shufflevector <32 x i8> %b, <32 x i8> poison, <16 x i32> 
+  store <16 x i8> %d, ptr %out
+  ret void
+}
+
+define void @extract_v32i8_quarters(ptr %in, ptr %out, ptr %out2, ptr %out3, 
ptr %out4) #0 vscale_range(2,2) {
+; CHECK-LABEL: extract_v32i8_quarters:
+; CHECK:   // %bb.0: // %entry
+; CHECK-NEXT:ldr z0, [x0]
+; CHECK-NEXT:mov z1.d, z0.d
+; CHECK-NEXT:mov z2.d, z0.d
+; CHECK-NEXT:ext z1.b, z1.b, z0.b, #16
+; CHECK-NEXT:ext z2.b, z2.b, z0.b, #24
+; CHECK-NEXT:str d1, [x1]
+; CHECK-NEXT:str d2, [x2]
+; CHECK-NEXT:str d0, [x3]
+; CHECK-NEXT:ext z0.b, z0.b, z0.b, #8
+; CHECK-NEXT:str d0, [x4]
+; CHECK-NEXT:ret
+entry:
+  %b = load <32 x i8>, ptr %in
+  %hilo = shufflevector <32 x i8> %b, <32 x i8> poison, <8 x i32> 
+  store <8 x i8> %hilo, ptr %out
+  %hihi = shufflevector <32 x i8> %b, <32 x i8> poison, <8 x i32> 
+  store <8 x i8> %hihi, ptr %out2
+  %lolo = shufflevector <32 x i8> %b, <32 x i8> poison, <8 x i32> 
+  store <8 x i8> %lolo, ptr %out3
+  %lohi = shufflevector <32 x i8> %b, <32 x i8> poison, <8 x i32> 
+  store <8 x i8> %lohi, ptr %out4
+  ret void
+}
+
 define void @extract_subvector_v64i8(ptr %a, ptr %b) #0 {
 ; CHECK-LABEL: extract_subvector_v64i8:
 ; CHECK:   // %bb.0:
@@ -54,6 +121,25 @@ define void @extract_subvector_v64i8(ptr %a, ptr %b) #0 {
   ret void
 }
 
+define void @extract_v64i8_halves(ptr %in, ptr %out, ptr %out2) #0 
vscale_range(4,4) {
+; CHECK-LABEL: extract_v64i8_halves:
+; CHECK:   // %bb.0: // %entry
+; CHECK-NEXT:ldr z0, [x0]
+; CHECK-NEXT:ptrue p0.b, vl32
+; CHECK-NEXT:mov z1.d, z0.d
+; CHECK-NEXT:ext z1.b, z1.b, z0.b, #32
+; CHECK-NEXT:st1b { z1.b }, p0, [x1]
+; CHECK-NEXT:st1b { z0.b }, p0, [x2]
+; CHECK-NEXT:ret
+entry:
+  %b = load <64 x i8>, ptr %in
+  %hi = shufflevector <64 x i8> %b, <64 x i8> poison, <32 x i32> 
+  store <32 x i8> %hi, ptr

[llvm-branch-commits] [llvm] [AArch64][ISel] Extend vector_splice tests (NFC) (PR #152553)

2025-08-07 Thread Gaëtan Bossu via llvm-branch-commits

https://github.com/gbossu edited 
https://github.com/llvm/llvm-project/pull/152553
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][ISel] Extend vector_splice tests (NFC) (PR #152553)

2025-08-07 Thread Gaëtan Bossu via llvm-branch-commits


@@ -0,0 +1,162 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mattr=+sve  -verify-machineinstrs < %s | FileCheck %s
+; RUN: llc -mattr=+sve2 -verify-machineinstrs < %s | FileCheck %s
+
+target triple = "aarch64-unknown-linux-gnu"
+
+; Test vector_splice patterns.
+; Note that this test is similar to named-vector-shuffles-sve.ll, but it 
focuses
+; on testing all supported types, and a positive "splice index".
+
+
+; i8 elements
+define  @splice_nxv16i8( %a,  %b) {
+; CHECK-LABEL: splice_nxv16i8:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ext z0.b, z0.b, z1.b, #1
+; CHECK-NEXT:ret
+  %res = call  @llvm.vector.splice.nxv16i8( %a,  %b, i32 1)
+  ret  %res
+}
+
+; i16 elements
+define  @splice_nxv8i16( %a,  %b) {
+; CHECK-LABEL: splice_nxv8i16:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ext z0.b, z0.b, z1.b, #2
+; CHECK-NEXT:ret
+  %res = call  @llvm.vector.splice.nxv8i16( %a,  %b, i32 1)
+  ret  %res
+}
+
+; bf16 elements
+
+define  @splice_nxv8bfloat( %a, 
 %b) {
+; CHECK-LABEL: splice_nxv8bfloat:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ext z0.b, z0.b, z1.b, #2
+; CHECK-NEXT:ret
+  %res = call  @llvm.vector.splice.nxv8bfloat( %a,  %b, i32 1)
+  ret  %res
+}
+
+define  @splice_nxv4bfloat( %a, 
 %b) {
+; CHECK-LABEL: splice_nxv4bfloat:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ext z0.b, z0.b, z1.b, #4
+; CHECK-NEXT:ret
+  %res = call  @llvm.vector.splice.nxv4bfloat( %a,  %b, i32 1)
+  ret  %res
+}

gbossu wrote:

⚠️  Similar to what I had metionned in a closed PR: 
https://github.com/llvm/llvm-project/pull/151730#discussion_r2248448988

We have patterns for `EXT_ZZI` with these "weird" types where the fixed part 
isn't 128-bit:
 - 
 - 
 - 
 - 
 - 

I'm not sure why they were here in the first place, and looking at the 
generated code, I think the patterns are wrong.

https://github.com/llvm/llvm-project/pull/152553
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][ISel] Select constructive EXT_ZZZI pseudo instruction (PR #152554)

2025-08-07 Thread Gaëtan Bossu via llvm-branch-commits


@@ -150,13 +150,14 @@ define void @fcvtzu_v16f16_v16i32(ptr %a, ptr %b) #0 {
 ; VBITS_GE_256-NEXT:mov x8, #8 // =0x8
 ; VBITS_GE_256-NEXT:ld1h { z0.h }, p0/z, [x0]
 ; VBITS_GE_256-NEXT:ptrue p0.s, vl8
-; VBITS_GE_256-NEXT:uunpklo z1.s, z0.h
-; VBITS_GE_256-NEXT:ext z0.b, z0.b, z0.b, #16
+; VBITS_GE_256-NEXT:movprfx z1, z0
+; VBITS_GE_256-NEXT:ext z1.b, z1.b, z0.b, #16
 ; VBITS_GE_256-NEXT:uunpklo z0.s, z0.h
-; VBITS_GE_256-NEXT:fcvtzu z1.s, p0/m, z1.h
+; VBITS_GE_256-NEXT:uunpklo z1.s, z1.h
 ; VBITS_GE_256-NEXT:fcvtzu z0.s, p0/m, z0.h
-; VBITS_GE_256-NEXT:st1w { z1.s }, p0, [x1]
-; VBITS_GE_256-NEXT:st1w { z0.s }, p0, [x1, x8, lsl #2]
+; VBITS_GE_256-NEXT:fcvtzu z1.s, p0/m, z1.h
+; VBITS_GE_256-NEXT:st1w { z0.s }, p0, [x1]
+; VBITS_GE_256-NEXT:st1w { z1.s }, p0, [x1, x8, lsl #2]

gbossu wrote:

In that example, we do get one more instruction now (the `movprfx`), but I 
think the schedule is actually better because we eliminate one dependency 
between `ext` and the second `uunpklo`. Now the two `uunpklo` can execute in 
parallel.

This is is the theme of the test updates in general: Sometimes more 
instructions, but more freedom for the `MachineScheduler`

https://github.com/llvm/llvm-project/pull/152554
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][ISel] Select constructive EXT_ZZZI pseudo instruction (PR #152554)

2025-08-07 Thread Gaëtan Bossu via llvm-branch-commits


@@ -256,12 +256,13 @@ define  
@splice_nxv2f64_last_idx( %a,
 define  @splice_nxv2i1_idx( %a,  %b) #0 {
 ; CHECK-LABEL: splice_nxv2i1_idx:
 ; CHECK:   // %bb.0:
-; CHECK-NEXT:mov z0.d, p1/z, #1 // =0x1
 ; CHECK-NEXT:mov z1.d, p0/z, #1 // =0x1
+; CHECK-NEXT:mov z0.d, p1/z, #1 // =0x1
 ; CHECK-NEXT:ptrue p0.d
-; CHECK-NEXT:ext z1.b, z1.b, z0.b, #8
-; CHECK-NEXT:and z1.d, z1.d, #0x1
-; CHECK-NEXT:cmpne p0.d, p0/z, z1.d, #0
+; CHECK-NEXT:mov z0.d, z1.d

gbossu wrote:

This is one case where we get worse due to an extra MOV that could not be 
turned into a MOVPRFX. THis is alleviated in the next commit using register 
hints.

https://github.com/llvm/llvm-project/pull/152554
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][ISel] Select constructive EXT_ZZZI pseudo instruction (PR #152554)

2025-08-07 Thread Gaëtan Bossu via llvm-branch-commits


@@ -86,6 +83,13 @@ bool 
AArch64PostCoalescer::runOnMachineFunction(MachineFunction &MF) {
 Changed = true;
 break;
   }
+  case AArch64::EXT_ZZZI:
+Register DstReg = MI.getOperand(0).getReg();
+Register SrcReg1 = MI.getOperand(1).getReg();
+if (SrcReg1 != DstReg) {
+  MRI->setRegAllocationHint(DstReg, 0, SrcReg1);
+}
+break;

gbossu wrote:

Note that this commit is really just a WIP to show we can slightly improve 
codegen with some hints. I'm not sure it should remain in that PR.

https://github.com/llvm/llvm-project/pull/152554
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][ISel] Select constructive SVE2 ext instruction (PR #151730)

2025-08-04 Thread Gaëtan Bossu via llvm-branch-commits


@@ -4069,6 +4069,22 @@ let Predicates = [HasSVE2_or_SME] in {
   let AddedComplexity = 2 in {
 def : Pat<(nxv16i8 (AArch64ext nxv16i8:$zn1, nxv16i8:$zn2, (i32 
imm0_255:$imm))),
   (EXT_ZZI_B (REG_SEQUENCE ZPR2, $zn1, zsub0, $zn2, zsub1), 
imm0_255:$imm)>;
+
+foreach VT = [nxv16i8] in
+  def : Pat<(VT (vector_splice VT:$Z1, VT:$Z2, (i64 (sve_ext_imm_0_255 
i32:$index,
+(EXT_ZZI_B  (REG_SEQUENCE ZPR2, $Z1, zsub0, $Z2, zsub1), 
imm0_255:$index)>;

gbossu wrote:

What do you mean by different output? Do you mean that if you replce the splice 
intrisics with AArch64's EXT intrinsics, then 
`llvm/test/CodeGen/AArch64/sve-vector-splice.ll` has different CHECK lines?

For a generic splice with two inputs, I'd expect the output to be the same. The 
change I made in the first PR is only for "subvector-extract" splice 
instructions created when lowering vector_extract, where we can mark the second 
input as `undef`.

When you say removing the former, do you mean removing the pattern? Or the 
intrinsic altogether? I would need to refresh my brain after the week-end but I 
think llvm's vector_splice, AArch64 EXT and AArch64 SPLICE all have slightly 
different semantics (especially for negative indices).

https://github.com/llvm/llvm-project/pull/151730
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][ISel] Select constructive SVE2 ext instruction (PR #151730)

2025-08-04 Thread Gaëtan Bossu via llvm-branch-commits


@@ -4069,6 +4069,22 @@ let Predicates = [HasSVE2_or_SME] in {
   let AddedComplexity = 2 in {
 def : Pat<(nxv16i8 (AArch64ext nxv16i8:$zn1, nxv16i8:$zn2, (i32 
imm0_255:$imm))),
   (EXT_ZZI_B (REG_SEQUENCE ZPR2, $zn1, zsub0, $zn2, zsub1), 
imm0_255:$imm)>;
+
+foreach VT = [nxv16i8] in
+  def : Pat<(VT (vector_splice VT:$Z1, VT:$Z2, (i64 (sve_ext_imm_0_255 
i32:$index,
+(EXT_ZZI_B  (REG_SEQUENCE ZPR2, $Z1, zsub0, $Z2, zsub1), 
imm0_255:$index)>;

gbossu wrote:

I'll check about removing the EXT intrinsic then. 👍  I needed the new pattern 
because SDAG lowers the `vector_extract` nodes to `vector_splice`, not 
AArch64's `EXT`.

https://github.com/llvm/llvm-project/pull/151730
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][ISel] Select constructive SVE2 ext instruction (PR #151730)

2025-08-04 Thread Gaëtan Bossu via llvm-branch-commits


@@ -4069,6 +4069,22 @@ let Predicates = [HasSVE2_or_SME] in {
   let AddedComplexity = 2 in {
 def : Pat<(nxv16i8 (AArch64ext nxv16i8:$zn1, nxv16i8:$zn2, (i32 
imm0_255:$imm))),
   (EXT_ZZI_B (REG_SEQUENCE ZPR2, $zn1, zsub0, $zn2, zsub1), 
imm0_255:$imm)>;
+
+foreach VT = [nxv16i8] in
+  def : Pat<(VT (vector_splice VT:$Z1, VT:$Z2, (i64 (sve_ext_imm_0_255 
i32:$index,
+(EXT_ZZI_B  (REG_SEQUENCE ZPR2, $Z1, zsub0, $Z2, zsub1), 
imm0_255:$index)>;

gbossu wrote:

Right, I guess I'll leave the intrinsic then

https://github.com/llvm/llvm-project/pull/151730
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)

2025-05-13 Thread Gaëtan Bossu via llvm-branch-commits


@@ -2512,9 +2507,11 @@ class VPMulAccumulateReductionRecipe : public 
VPReductionRecipe {
 MulAcc->getCondOp(), MulAcc->isOrdered(),
 WrapFlagsTy(MulAcc->hasNoUnsignedWrap(), 
MulAcc->hasNoSignedWrap()),
 MulAcc->getDebugLoc()),
-ExtOp(MulAcc->getExtOpcode()), IsNonNeg(MulAcc->isNonNeg()),
 ResultTy(MulAcc->getResultType()),
-IsPartialReduction(MulAcc->isPartialReduction()) {}
+IsPartialReduction(MulAcc->isPartialReduction()) {
+VecOpInfo[0] = MulAcc->getVecOp0Info();
+VecOpInfo[1] = MulAcc->getVecOp1Info();
+  }

gbossu wrote:

Probably a stupid question because I'm not familiar with `VPlan`, but is there 
a reason why this isn't a more standard copy constructor, i.e. taking a `const 
VPMulAccumulateReductionRecipe &` as parameter?

https://github.com/llvm/llvm-project/pull/136997
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)

2025-05-13 Thread Gaëtan Bossu via llvm-branch-commits


@@ -2586,22 +2590,21 @@ class VPMulAccumulateReductionRecipe : public 
VPReductionRecipe {
   VPValue *getVecOp1() const { return getOperand(2); }
 
   /// Return if this MulAcc recipe contains extend instructions.
-  bool isExtended() const { return ExtOp != Instruction::CastOps::CastOpsEnd; }
+  bool isExtended() const {

gbossu wrote:

Nit: Maybe assert that `ExtOp` is either ZExt, Sext, or CastOpsEnd

https://github.com/llvm/llvm-project/pull/136997
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)

2025-05-13 Thread Gaëtan Bossu via llvm-branch-commits


@@ -2586,22 +2590,21 @@ class VPMulAccumulateReductionRecipe : public 
VPReductionRecipe {
   VPValue *getVecOp1() const { return getOperand(2); }
 
   /// Return if this MulAcc recipe contains extend instructions.
-  bool isExtended() const { return ExtOp != Instruction::CastOps::CastOpsEnd; }
+  bool isExtended() const {
+return getVecOp0Info().ExtOp != Instruction::CastOps::CastOpsEnd;

gbossu wrote:

Is there a reason why we aren't checking `VecOpInfo[1]`? AFAIU their 
`Instruction::CastOps` could be different.

https://github.com/llvm/llvm-project/pull/136997
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)

2025-05-13 Thread Gaëtan Bossu via llvm-branch-commits


@@ -2586,22 +2590,21 @@ class VPMulAccumulateReductionRecipe : public 
VPReductionRecipe {
   VPValue *getVecOp1() const { return getOperand(2); }
 
   /// Return if this MulAcc recipe contains extend instructions.
-  bool isExtended() const { return ExtOp != Instruction::CastOps::CastOpsEnd; }
+  bool isExtended() const {
+return getVecOp0Info().ExtOp != Instruction::CastOps::CastOpsEnd;
+  }
 
   /// Return if the operands of mul instruction come from same extend.
-  bool isSameExtend() const { return getVecOp0() == getVecOp1(); }
-
-  /// Return the opcode of the underlying extend.
-  Instruction::CastOps getExtOpcode() const { return ExtOp; }
+  bool isSameExtendVal() const { return getVecOp0() == getVecOp1(); }
 
-  /// Return if the extend opcode is ZExt.
-  bool isZExt() const { return ExtOp == Instruction::CastOps::ZExt; }
-
-  /// Return the non negative flag of the ext recipe.
-  bool isNonNeg() const { return IsNonNeg; }
+  VecOperandInfo getVecOp0Info() const { return VecOpInfo[0]; }
+  VecOperandInfo getVecOp1Info() const { return VecOpInfo[1]; }

gbossu wrote:

Super-Nit: Would it make sense to return a const refence? The struct is pretty 
small now, so I guess the copy does not hurt, but maybe the struct will grow 
over time?

https://github.com/llvm/llvm-project/pull/136997
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)

2025-05-13 Thread Gaëtan Bossu via llvm-branch-commits


@@ -2526,13 +2523,14 @@ class VPMulAccumulateReductionRecipe : public 
VPReductionRecipe {
 R->getCondOp(), R->isOrdered(),
 WrapFlagsTy(Mul->hasNoUnsignedWrap(), Mul->hasNoSignedWrap()),
 R->getDebugLoc()),
-ExtOp(Ext0->getOpcode()), IsNonNeg(Ext0->isNonNeg()),
 ResultTy(ResultTy),
 IsPartialReduction(isa(R)) {
 assert(RecurrenceDescriptor::getOpcode(getRecurrenceKind()) ==
Instruction::Add &&
"The reduction instruction in MulAccumulateteReductionRecipe must "
"be Add");
+VecOpInfo[0] = {Ext0->getOpcode(), Ext0->isNonNeg()};
+VecOpInfo[1] = {Ext1->getOpcode(), Ext1->isNonNeg()};

gbossu wrote:

Curious: From the description of the `VPMulAccumulateReductionRecipe` class, it 
seems that the extending operations are optional. Yet, this code seems to 
assume `Ext0` and `Ext1` aren't null. Does that mean that these widen recipes 
are always valid, but sometimes they represent an "identity" transformation?

https://github.com/llvm/llvm-project/pull/136997
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)

2025-05-13 Thread Gaëtan Bossu via llvm-branch-commits


@@ -2586,22 +2590,21 @@ class VPMulAccumulateReductionRecipe : public 
VPReductionRecipe {
   VPValue *getVecOp1() const { return getOperand(2); }
 
   /// Return if this MulAcc recipe contains extend instructions.
-  bool isExtended() const { return ExtOp != Instruction::CastOps::CastOpsEnd; }
+  bool isExtended() const {

gbossu wrote:

It's just that in other places of the code, I think there is an assumption that 
`isExtended()` is equivalent to `ZExt || SExt` while there are other types 
of`CastOps` like "FP to Int".

Please ignore me, this is a very pedantic comment ;)

https://github.com/llvm/llvm-project/pull/136997
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)

2025-05-13 Thread Gaëtan Bossu via llvm-branch-commits


@@ -2586,22 +2590,21 @@ class VPMulAccumulateReductionRecipe : public 
VPReductionRecipe {
   VPValue *getVecOp1() const { return getOperand(2); }
 
   /// Return if this MulAcc recipe contains extend instructions.
-  bool isExtended() const { return ExtOp != Instruction::CastOps::CastOpsEnd; }
+  bool isExtended() const {
+return getVecOp0Info().ExtOp != Instruction::CastOps::CastOpsEnd;

gbossu wrote:

But could it happen that Op0 is not extended, and Op1 is? (Probably a stupid 
question because I'm reading this code without prior knowledge about `VPlan` 
stuff 😄)

https://github.com/llvm/llvm-project/pull/136997
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][ISel] Select constructive EXT_ZZZI pseudo instruction (PR #152554)

2025-08-07 Thread Gaëtan Bossu via llvm-branch-commits

https://github.com/gbossu edited 
https://github.com/llvm/llvm-project/pull/152554
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][SME] Support agnostic ZA functions in the MachineSMEABIPass (PR #149064)

2025-09-02 Thread Gaëtan Bossu via llvm-branch-commits


@@ -250,6 +286,9 @@ struct MachineSMEABI : public MachineFunctionPass {
 SmallVector BundleStates;

gbossu wrote:

We are starting to accumulate a lot of state, which makes the code harder to 
follow as it allows any member function to modify it instead of having clear 
ins/outs.

I know we discussed that already 
[here](https://github.com/llvm/llvm-project/pull/149062#discussion_r2276758787),
 but I feel it would be nice not to delay the refactoring too much. Even having 
a first step that collects all the info in a struct would help. We could then 
pass that info around by const ref to any function that needs it. If some info 
needs to be mutable, then it should not be in the struct, and be a clear in/out 
parameter.

Doing something like this would clearly decouple the "collection" phase from 
the "let me correctly handle the state changes" phase.

https://github.com/llvm/llvm-project/pull/149064
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][SME] Support agnostic ZA functions in the MachineSMEABIPass (PR #149064)

2025-09-03 Thread Gaëtan Bossu via llvm-branch-commits


@@ -250,6 +286,9 @@ struct MachineSMEABI : public MachineFunctionPass {
 SmallVector BundleStates;
 std::optional TPIDR2Block;
 std::optional AfterSMEProloguePt;
+Register AgnosticZABufferPtr = AArch64::NoRegister;
+LiveRegs PhysLiveRegsAfterSMEPrologue = LiveRegs::None;
+bool HasFullZASaveRestore = false;

gbossu wrote:

Could you document what a "full ZA save/restore" is? How does this differ from 
the "plain" save/restore?

https://github.com/llvm/llvm-project/pull/149064
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][SME] Support agnostic ZA functions in the MachineSMEABIPass (PR #149064)

2025-09-03 Thread Gaëtan Bossu via llvm-branch-commits


@@ -200,7 +200,7 @@ struct MachineSMEABI : public MachineFunctionPass {
 
   /// Inserts code to handle changes between ZA states within the function.
   /// E.g., ACTIVE -> LOCAL_SAVED will insert code required to save ZA.
-  void insertStateChanges();
+  void insertStateChanges(bool IsAgnosticZA);

gbossu wrote:

Also, are `IsAgnosticZA` and `!IsAgnosticZA` the two types of functions we'll 
ever have have to handle? If not, should we already turn this into an enum?

https://github.com/llvm/llvm-project/pull/149064
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits