[llvm-branch-commits] [llvm] [AArch64][ISel] Select constructive SVE2 ext instruction (PR #151730)
https://github.com/gbossu closed https://github.com/llvm/llvm-project/pull/151730 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][ISel] Select constructive SVE2 ext instruction (PR #151730)
gbossu wrote: Closing, I'll work on supporting `movprfx` for `ext` instead. https://github.com/llvm/llvm-project/pull/151730 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][ISel] Select constructive SVE2 ext instruction (PR #151730)
@@ -0,0 +1,253 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py +; RUN: llc -mattr=+sve -verify-machineinstrs < %s | FileCheck %s --check-prefixes=SVE +; RUN: llc -mattr=+sve2 -verify-machineinstrs < %s | FileCheck %s --check-prefixes=SVE2 + +target triple = "aarch64-unknown-linux-gnu" + +; Test vector_splice patterns. +; Note that this test is similar to named-vector-shuffles-sve.ll, but it focuses +; on testing all supported types, and a positive "splice index". + + +; i8 elements +define @splice_nxv16i8( %a, %b) { +; SVE-LABEL: splice_nxv16i8: +; SVE: // %bb.0: +; SVE-NEXT:ext z0.b, z0.b, z1.b, #1 +; SVE-NEXT:ret +; +; SVE2-LABEL: splice_nxv16i8: +; SVE2: // %bb.0: +; SVE2-NEXT:// kill: def $z1 killed $z1 killed $z0_z1 def $z0_z1 +; SVE2-NEXT:// kill: def $z0 killed $z0 killed $z0_z1 def $z0_z1 +; SVE2-NEXT:ext z0.b, { z0.b, z1.b }, #1 +; SVE2-NEXT:ret + %res = call @llvm.vector.splice.nxv16i8( %a, %b, i32 1) + ret %res +} + +; i16 elements +define @splice_nxv8i16( %a, %b) { +; SVE-LABEL: splice_nxv8i16: +; SVE: // %bb.0: +; SVE-NEXT:ext z0.b, z0.b, z1.b, #2 +; SVE-NEXT:ret +; +; SVE2-LABEL: splice_nxv8i16: +; SVE2: // %bb.0: +; SVE2-NEXT:// kill: def $z1 killed $z1 killed $z0_z1 def $z0_z1 +; SVE2-NEXT:// kill: def $z0 killed $z0 killed $z0_z1 def $z0_z1 +; SVE2-NEXT:ext z0.b, { z0.b, z1.b }, #2 +; SVE2-NEXT:ret + %res = call @llvm.vector.splice.nxv8i16( %a, %b, i32 1) + ret %res +} + +; bf16 elements + +define @splice_nxv8bfloat( %a, %b) { +; SVE-LABEL: splice_nxv8bfloat: +; SVE: // %bb.0: +; SVE-NEXT:ext z0.b, z0.b, z1.b, #2 +; SVE-NEXT:ret +; +; SVE2-LABEL: splice_nxv8bfloat: +; SVE2: // %bb.0: +; SVE2-NEXT:// kill: def $z1 killed $z1 killed $z0_z1 def $z0_z1 +; SVE2-NEXT:// kill: def $z0 killed $z0 killed $z0_z1 def $z0_z1 +; SVE2-NEXT:ext z0.b, { z0.b, z1.b }, #2 +; SVE2-NEXT:ret + %res = call @llvm.vector.splice.nxv8bfloat( %a, %b, i32 1) + ret %res +} + +define @splice_nxv4bfloat( %a, %b) { +; SVE-LABEL: splice_nxv4bfloat: +; SVE: // %bb.0: +; SVE-NEXT:ext z0.b, z0.b, z1.b, #4 +; SVE-NEXT:ret +; +; SVE2-LABEL: splice_nxv4bfloat: +; SVE2: // %bb.0: +; SVE2-NEXT:// kill: def $z1 killed $z1 killed $z0_z1 def $z0_z1 +; SVE2-NEXT:// kill: def $z0 killed $z0 killed $z0_z1 def $z0_z1 +; SVE2-NEXT:ext z0.b, { z0.b, z1.b }, #4 +; SVE2-NEXT:ret + %res = call @llvm.vector.splice.nxv4bfloat( %a, %b, i32 1) gbossu wrote: ⚠️ I have included the same type support for `EXT_ZZI_B` as we have for the destructive `EXT_ZZI`, i.e. support for these "weird" types where the fixed part isn't 128-bit: - - - - - I'm not sure why they were here in the first place, and looking at the generated code, I think the patterns are wrong. Maybe I should just remove those types altogether? https://github.com/llvm/llvm-project/pull/151730 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][ISel] Select constructive SVE2 ext instruction (PR #151730)
@@ -109,14 +109,13 @@ define <16 x i16> @two_way_i8_i16_vl256(ptr %accptr, ptr %uptr, ptr %sptr) vscal ; SME-LABEL: two_way_i8_i16_vl256: ; SME: // %bb.0: ; SME-NEXT:ldr z0, [x0] -; SME-NEXT:ldr z1, [x1] -; SME-NEXT:ldr z2, [x2] -; SME-NEXT:umlalb z0.h, z2.b, z1.b -; SME-NEXT:umlalt z0.h, z2.b, z1.b -; SME-NEXT:mov z1.d, z0.d -; SME-NEXT:ext z1.b, z1.b, z0.b, #16 -; SME-NEXT:// kill: def $q0 killed $q0 killed $z0 -; SME-NEXT:// kill: def $q1 killed $q1 killed $z1 +; SME-NEXT:ldr z2, [x1] +; SME-NEXT:ldr z3, [x2] +; SME-NEXT:umlalb z0.h, z3.b, z2.b +; SME-NEXT:umlalt z0.h, z3.b, z2.b +; SME-NEXT:ext z2.b, { z0.b, z1.b }, #16 +; SME-NEXT:// kill: def $q0 killed $q0 killed $z0_z1 +; SME-NEXT:mov z1.d, z2.d gbossu wrote: This is one example where we would gain by having subreg liveness. Currently the `ret` instruction has an implicit use of `z0` and `z1` for ABI reasons. This forces a use of all aliasing registers, including `z0_z1`, which will be considered live from `umlalt z0.h, z3.b, z2.b`. As a consequence, `ext z2.b, { z0.b, z1.b }, #16` cannot be rewritten directly as `ext z1.b, { z0.b, z1.b }, #16` as it would create an interference. With subreg liveness enabled, we would see there is no interference for `z0_z1.hi`. https://github.com/llvm/llvm-project/pull/151730 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][ISel] Extend vector_splice tests (NFC) (PR #152553)
https://github.com/gbossu created https://github.com/llvm/llvm-project/pull/152553 They use extract shuffles for fixed vectors, and llvm.vector.splice intrinsics for scalable vectors. In the previous tests using ld+extract+st, the extract was optimized away and replaced by a smaller load at the right offset. This meant we didin't really test the vector_splice ISD node. **This is a chained PR** From a6be08b2dd026b6b3dcd7ca8ed5e231671a160b3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ga=C3=ABtan=20Bossu?= Date: Wed, 6 Aug 2025 10:32:44 + Subject: [PATCH] [AArch64][ISel] Extend vector_splice tests (NFC) They use extract shuffles for fixed vectors, and llvm.vector.splice intrinsics for scalable vectors. In the previous tests using ld+extract+st, the extract was optimized away and replaced by a smaller load at the right offset. This meant we didin't really test the vector_splice ISD node. --- .../sve-fixed-length-extract-subvector.ll | 368 +- .../test/CodeGen/AArch64/sve-vector-splice.ll | 162 2 files changed, 526 insertions(+), 4 deletions(-) create mode 100644 llvm/test/CodeGen/AArch64/sve-vector-splice.ll diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-extract-subvector.ll b/llvm/test/CodeGen/AArch64/sve-fixed-length-extract-subvector.ll index 2dd3269a2..800f95d97af4c 100644 --- a/llvm/test/CodeGen/AArch64/sve-fixed-length-extract-subvector.ll +++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-extract-subvector.ll @@ -5,6 +5,12 @@ target triple = "aarch64-unknown-linux-gnu" +; Note that both the vector.extract intrinsics and SK_ExtractSubvector +; shufflevector instructions get detected as a extract_subvector ISD node in +; SelectionDAG. We'll test both cases for the sake of completeness, even though +; vector.extract intrinsics should get lowered into shufflevector by the time we +; reach the backend. + ; i8 ; Don't use SVE for 64-bit vectors. @@ -40,6 +46,67 @@ define void @extract_subvector_v32i8(ptr %a, ptr %b) vscale_range(2,0) #0 { ret void } +define void @extract_v32i8_halves(ptr %in, ptr %out, ptr %out2) #0 vscale_range(2,2) { +; CHECK-LABEL: extract_v32i8_halves: +; CHECK: // %bb.0: // %entry +; CHECK-NEXT:ldr z0, [x0] +; CHECK-NEXT:mov z1.d, z0.d +; CHECK-NEXT:ext z1.b, z1.b, z0.b, #16 +; CHECK-NEXT:str q1, [x1] +; CHECK-NEXT:str q0, [x2] +; CHECK-NEXT:ret +entry: + %b = load <32 x i8>, ptr %in + %hi = shufflevector <32 x i8> %b, <32 x i8> poison, <16 x i32> + store <16 x i8> %hi, ptr %out + %lo = shufflevector <32 x i8> %b, <32 x i8> poison, <16 x i32> + store <16 x i8> %lo, ptr %out2 + ret void +} + +define void @extract_v32i8_half_unaligned(ptr %in, ptr %out) #0 vscale_range(2,2) { +; CHECK-LABEL: extract_v32i8_half_unaligned: +; CHECK: // %bb.0: // %entry +; CHECK-NEXT:ldr z0, [x0] +; CHECK-NEXT:mov z1.d, z0.d +; CHECK-NEXT:ext z1.b, z1.b, z0.b, #16 +; CHECK-NEXT:ext v0.16b, v0.16b, v1.16b, #4 +; CHECK-NEXT:str q0, [x1] +; CHECK-NEXT:ret +entry: + %b = load <32 x i8>, ptr %in + %d = shufflevector <32 x i8> %b, <32 x i8> poison, <16 x i32> + store <16 x i8> %d, ptr %out + ret void +} + +define void @extract_v32i8_quarters(ptr %in, ptr %out, ptr %out2, ptr %out3, ptr %out4) #0 vscale_range(2,2) { +; CHECK-LABEL: extract_v32i8_quarters: +; CHECK: // %bb.0: // %entry +; CHECK-NEXT:ldr z0, [x0] +; CHECK-NEXT:mov z1.d, z0.d +; CHECK-NEXT:mov z2.d, z0.d +; CHECK-NEXT:ext z1.b, z1.b, z0.b, #16 +; CHECK-NEXT:ext z2.b, z2.b, z0.b, #24 +; CHECK-NEXT:str d1, [x1] +; CHECK-NEXT:str d2, [x2] +; CHECK-NEXT:str d0, [x3] +; CHECK-NEXT:ext z0.b, z0.b, z0.b, #8 +; CHECK-NEXT:str d0, [x4] +; CHECK-NEXT:ret +entry: + %b = load <32 x i8>, ptr %in + %hilo = shufflevector <32 x i8> %b, <32 x i8> poison, <8 x i32> + store <8 x i8> %hilo, ptr %out + %hihi = shufflevector <32 x i8> %b, <32 x i8> poison, <8 x i32> + store <8 x i8> %hihi, ptr %out2 + %lolo = shufflevector <32 x i8> %b, <32 x i8> poison, <8 x i32> + store <8 x i8> %lolo, ptr %out3 + %lohi = shufflevector <32 x i8> %b, <32 x i8> poison, <8 x i32> + store <8 x i8> %lohi, ptr %out4 + ret void +} + define void @extract_subvector_v64i8(ptr %a, ptr %b) #0 { ; CHECK-LABEL: extract_subvector_v64i8: ; CHECK: // %bb.0: @@ -54,6 +121,25 @@ define void @extract_subvector_v64i8(ptr %a, ptr %b) #0 { ret void } +define void @extract_v64i8_halves(ptr %in, ptr %out, ptr %out2) #0 vscale_range(4,4) { +; CHECK-LABEL: extract_v64i8_halves: +; CHECK: // %bb.0: // %entry +; CHECK-NEXT:ldr z0, [x0] +; CHECK-NEXT:ptrue p0.b, vl32 +; CHECK-NEXT:mov z1.d, z0.d +; CHECK-NEXT:ext z1.b, z1.b, z0.b, #32 +; CHECK-NEXT:st1b { z1.b }, p0, [x1] +; CHECK-NEXT:st1b { z0.b }, p0, [x2] +; CHECK-NEXT:ret +entry: + %b = load <64 x i8>, ptr %in + %hi = shufflevector <64 x i8> %b, <64 x i8> poison, <32 x i32> + store <32 x i8> %hi, ptr
[llvm-branch-commits] [llvm] [AArch64][ISel] Extend vector_splice tests (NFC) (PR #152553)
https://github.com/gbossu edited https://github.com/llvm/llvm-project/pull/152553 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][ISel] Extend vector_splice tests (NFC) (PR #152553)
@@ -0,0 +1,162 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py +; RUN: llc -mattr=+sve -verify-machineinstrs < %s | FileCheck %s +; RUN: llc -mattr=+sve2 -verify-machineinstrs < %s | FileCheck %s + +target triple = "aarch64-unknown-linux-gnu" + +; Test vector_splice patterns. +; Note that this test is similar to named-vector-shuffles-sve.ll, but it focuses +; on testing all supported types, and a positive "splice index". + + +; i8 elements +define @splice_nxv16i8( %a, %b) { +; CHECK-LABEL: splice_nxv16i8: +; CHECK: // %bb.0: +; CHECK-NEXT:ext z0.b, z0.b, z1.b, #1 +; CHECK-NEXT:ret + %res = call @llvm.vector.splice.nxv16i8( %a, %b, i32 1) + ret %res +} + +; i16 elements +define @splice_nxv8i16( %a, %b) { +; CHECK-LABEL: splice_nxv8i16: +; CHECK: // %bb.0: +; CHECK-NEXT:ext z0.b, z0.b, z1.b, #2 +; CHECK-NEXT:ret + %res = call @llvm.vector.splice.nxv8i16( %a, %b, i32 1) + ret %res +} + +; bf16 elements + +define @splice_nxv8bfloat( %a, %b) { +; CHECK-LABEL: splice_nxv8bfloat: +; CHECK: // %bb.0: +; CHECK-NEXT:ext z0.b, z0.b, z1.b, #2 +; CHECK-NEXT:ret + %res = call @llvm.vector.splice.nxv8bfloat( %a, %b, i32 1) + ret %res +} + +define @splice_nxv4bfloat( %a, %b) { +; CHECK-LABEL: splice_nxv4bfloat: +; CHECK: // %bb.0: +; CHECK-NEXT:ext z0.b, z0.b, z1.b, #4 +; CHECK-NEXT:ret + %res = call @llvm.vector.splice.nxv4bfloat( %a, %b, i32 1) + ret %res +} gbossu wrote: ⚠️ Similar to what I had metionned in a closed PR: https://github.com/llvm/llvm-project/pull/151730#discussion_r2248448988 We have patterns for `EXT_ZZI` with these "weird" types where the fixed part isn't 128-bit: - - - - - I'm not sure why they were here in the first place, and looking at the generated code, I think the patterns are wrong. https://github.com/llvm/llvm-project/pull/152553 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][ISel] Select constructive EXT_ZZZI pseudo instruction (PR #152554)
@@ -150,13 +150,14 @@ define void @fcvtzu_v16f16_v16i32(ptr %a, ptr %b) #0 { ; VBITS_GE_256-NEXT:mov x8, #8 // =0x8 ; VBITS_GE_256-NEXT:ld1h { z0.h }, p0/z, [x0] ; VBITS_GE_256-NEXT:ptrue p0.s, vl8 -; VBITS_GE_256-NEXT:uunpklo z1.s, z0.h -; VBITS_GE_256-NEXT:ext z0.b, z0.b, z0.b, #16 +; VBITS_GE_256-NEXT:movprfx z1, z0 +; VBITS_GE_256-NEXT:ext z1.b, z1.b, z0.b, #16 ; VBITS_GE_256-NEXT:uunpklo z0.s, z0.h -; VBITS_GE_256-NEXT:fcvtzu z1.s, p0/m, z1.h +; VBITS_GE_256-NEXT:uunpklo z1.s, z1.h ; VBITS_GE_256-NEXT:fcvtzu z0.s, p0/m, z0.h -; VBITS_GE_256-NEXT:st1w { z1.s }, p0, [x1] -; VBITS_GE_256-NEXT:st1w { z0.s }, p0, [x1, x8, lsl #2] +; VBITS_GE_256-NEXT:fcvtzu z1.s, p0/m, z1.h +; VBITS_GE_256-NEXT:st1w { z0.s }, p0, [x1] +; VBITS_GE_256-NEXT:st1w { z1.s }, p0, [x1, x8, lsl #2] gbossu wrote: In that example, we do get one more instruction now (the `movprfx`), but I think the schedule is actually better because we eliminate one dependency between `ext` and the second `uunpklo`. Now the two `uunpklo` can execute in parallel. This is is the theme of the test updates in general: Sometimes more instructions, but more freedom for the `MachineScheduler` https://github.com/llvm/llvm-project/pull/152554 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][ISel] Select constructive EXT_ZZZI pseudo instruction (PR #152554)
@@ -256,12 +256,13 @@ define @splice_nxv2f64_last_idx( %a, define @splice_nxv2i1_idx( %a, %b) #0 { ; CHECK-LABEL: splice_nxv2i1_idx: ; CHECK: // %bb.0: -; CHECK-NEXT:mov z0.d, p1/z, #1 // =0x1 ; CHECK-NEXT:mov z1.d, p0/z, #1 // =0x1 +; CHECK-NEXT:mov z0.d, p1/z, #1 // =0x1 ; CHECK-NEXT:ptrue p0.d -; CHECK-NEXT:ext z1.b, z1.b, z0.b, #8 -; CHECK-NEXT:and z1.d, z1.d, #0x1 -; CHECK-NEXT:cmpne p0.d, p0/z, z1.d, #0 +; CHECK-NEXT:mov z0.d, z1.d gbossu wrote: This is one case where we get worse due to an extra MOV that could not be turned into a MOVPRFX. THis is alleviated in the next commit using register hints. https://github.com/llvm/llvm-project/pull/152554 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][ISel] Select constructive EXT_ZZZI pseudo instruction (PR #152554)
@@ -86,6 +83,13 @@ bool AArch64PostCoalescer::runOnMachineFunction(MachineFunction &MF) { Changed = true; break; } + case AArch64::EXT_ZZZI: +Register DstReg = MI.getOperand(0).getReg(); +Register SrcReg1 = MI.getOperand(1).getReg(); +if (SrcReg1 != DstReg) { + MRI->setRegAllocationHint(DstReg, 0, SrcReg1); +} +break; gbossu wrote: Note that this commit is really just a WIP to show we can slightly improve codegen with some hints. I'm not sure it should remain in that PR. https://github.com/llvm/llvm-project/pull/152554 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][ISel] Select constructive SVE2 ext instruction (PR #151730)
@@ -4069,6 +4069,22 @@ let Predicates = [HasSVE2_or_SME] in { let AddedComplexity = 2 in { def : Pat<(nxv16i8 (AArch64ext nxv16i8:$zn1, nxv16i8:$zn2, (i32 imm0_255:$imm))), (EXT_ZZI_B (REG_SEQUENCE ZPR2, $zn1, zsub0, $zn2, zsub1), imm0_255:$imm)>; + +foreach VT = [nxv16i8] in + def : Pat<(VT (vector_splice VT:$Z1, VT:$Z2, (i64 (sve_ext_imm_0_255 i32:$index, +(EXT_ZZI_B (REG_SEQUENCE ZPR2, $Z1, zsub0, $Z2, zsub1), imm0_255:$index)>; gbossu wrote: What do you mean by different output? Do you mean that if you replce the splice intrisics with AArch64's EXT intrinsics, then `llvm/test/CodeGen/AArch64/sve-vector-splice.ll` has different CHECK lines? For a generic splice with two inputs, I'd expect the output to be the same. The change I made in the first PR is only for "subvector-extract" splice instructions created when lowering vector_extract, where we can mark the second input as `undef`. When you say removing the former, do you mean removing the pattern? Or the intrinsic altogether? I would need to refresh my brain after the week-end but I think llvm's vector_splice, AArch64 EXT and AArch64 SPLICE all have slightly different semantics (especially for negative indices). https://github.com/llvm/llvm-project/pull/151730 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][ISel] Select constructive SVE2 ext instruction (PR #151730)
@@ -4069,6 +4069,22 @@ let Predicates = [HasSVE2_or_SME] in { let AddedComplexity = 2 in { def : Pat<(nxv16i8 (AArch64ext nxv16i8:$zn1, nxv16i8:$zn2, (i32 imm0_255:$imm))), (EXT_ZZI_B (REG_SEQUENCE ZPR2, $zn1, zsub0, $zn2, zsub1), imm0_255:$imm)>; + +foreach VT = [nxv16i8] in + def : Pat<(VT (vector_splice VT:$Z1, VT:$Z2, (i64 (sve_ext_imm_0_255 i32:$index, +(EXT_ZZI_B (REG_SEQUENCE ZPR2, $Z1, zsub0, $Z2, zsub1), imm0_255:$index)>; gbossu wrote: I'll check about removing the EXT intrinsic then. 👍 I needed the new pattern because SDAG lowers the `vector_extract` nodes to `vector_splice`, not AArch64's `EXT`. https://github.com/llvm/llvm-project/pull/151730 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][ISel] Select constructive SVE2 ext instruction (PR #151730)
@@ -4069,6 +4069,22 @@ let Predicates = [HasSVE2_or_SME] in { let AddedComplexity = 2 in { def : Pat<(nxv16i8 (AArch64ext nxv16i8:$zn1, nxv16i8:$zn2, (i32 imm0_255:$imm))), (EXT_ZZI_B (REG_SEQUENCE ZPR2, $zn1, zsub0, $zn2, zsub1), imm0_255:$imm)>; + +foreach VT = [nxv16i8] in + def : Pat<(VT (vector_splice VT:$Z1, VT:$Z2, (i64 (sve_ext_imm_0_255 i32:$index, +(EXT_ZZI_B (REG_SEQUENCE ZPR2, $Z1, zsub0, $Z2, zsub1), imm0_255:$index)>; gbossu wrote: Right, I guess I'll leave the intrinsic then https://github.com/llvm/llvm-project/pull/151730 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)
@@ -2512,9 +2507,11 @@ class VPMulAccumulateReductionRecipe : public VPReductionRecipe { MulAcc->getCondOp(), MulAcc->isOrdered(), WrapFlagsTy(MulAcc->hasNoUnsignedWrap(), MulAcc->hasNoSignedWrap()), MulAcc->getDebugLoc()), -ExtOp(MulAcc->getExtOpcode()), IsNonNeg(MulAcc->isNonNeg()), ResultTy(MulAcc->getResultType()), -IsPartialReduction(MulAcc->isPartialReduction()) {} +IsPartialReduction(MulAcc->isPartialReduction()) { +VecOpInfo[0] = MulAcc->getVecOp0Info(); +VecOpInfo[1] = MulAcc->getVecOp1Info(); + } gbossu wrote: Probably a stupid question because I'm not familiar with `VPlan`, but is there a reason why this isn't a more standard copy constructor, i.e. taking a `const VPMulAccumulateReductionRecipe &` as parameter? https://github.com/llvm/llvm-project/pull/136997 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)
@@ -2586,22 +2590,21 @@ class VPMulAccumulateReductionRecipe : public VPReductionRecipe { VPValue *getVecOp1() const { return getOperand(2); } /// Return if this MulAcc recipe contains extend instructions. - bool isExtended() const { return ExtOp != Instruction::CastOps::CastOpsEnd; } + bool isExtended() const { gbossu wrote: Nit: Maybe assert that `ExtOp` is either ZExt, Sext, or CastOpsEnd https://github.com/llvm/llvm-project/pull/136997 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)
@@ -2586,22 +2590,21 @@ class VPMulAccumulateReductionRecipe : public VPReductionRecipe { VPValue *getVecOp1() const { return getOperand(2); } /// Return if this MulAcc recipe contains extend instructions. - bool isExtended() const { return ExtOp != Instruction::CastOps::CastOpsEnd; } + bool isExtended() const { +return getVecOp0Info().ExtOp != Instruction::CastOps::CastOpsEnd; gbossu wrote: Is there a reason why we aren't checking `VecOpInfo[1]`? AFAIU their `Instruction::CastOps` could be different. https://github.com/llvm/llvm-project/pull/136997 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)
@@ -2586,22 +2590,21 @@ class VPMulAccumulateReductionRecipe : public VPReductionRecipe { VPValue *getVecOp1() const { return getOperand(2); } /// Return if this MulAcc recipe contains extend instructions. - bool isExtended() const { return ExtOp != Instruction::CastOps::CastOpsEnd; } + bool isExtended() const { +return getVecOp0Info().ExtOp != Instruction::CastOps::CastOpsEnd; + } /// Return if the operands of mul instruction come from same extend. - bool isSameExtend() const { return getVecOp0() == getVecOp1(); } - - /// Return the opcode of the underlying extend. - Instruction::CastOps getExtOpcode() const { return ExtOp; } + bool isSameExtendVal() const { return getVecOp0() == getVecOp1(); } - /// Return if the extend opcode is ZExt. - bool isZExt() const { return ExtOp == Instruction::CastOps::ZExt; } - - /// Return the non negative flag of the ext recipe. - bool isNonNeg() const { return IsNonNeg; } + VecOperandInfo getVecOp0Info() const { return VecOpInfo[0]; } + VecOperandInfo getVecOp1Info() const { return VecOpInfo[1]; } gbossu wrote: Super-Nit: Would it make sense to return a const refence? The struct is pretty small now, so I guess the copy does not hurt, but maybe the struct will grow over time? https://github.com/llvm/llvm-project/pull/136997 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)
@@ -2526,13 +2523,14 @@ class VPMulAccumulateReductionRecipe : public VPReductionRecipe { R->getCondOp(), R->isOrdered(), WrapFlagsTy(Mul->hasNoUnsignedWrap(), Mul->hasNoSignedWrap()), R->getDebugLoc()), -ExtOp(Ext0->getOpcode()), IsNonNeg(Ext0->isNonNeg()), ResultTy(ResultTy), IsPartialReduction(isa(R)) { assert(RecurrenceDescriptor::getOpcode(getRecurrenceKind()) == Instruction::Add && "The reduction instruction in MulAccumulateteReductionRecipe must " "be Add"); +VecOpInfo[0] = {Ext0->getOpcode(), Ext0->isNonNeg()}; +VecOpInfo[1] = {Ext1->getOpcode(), Ext1->isNonNeg()}; gbossu wrote: Curious: From the description of the `VPMulAccumulateReductionRecipe` class, it seems that the extending operations are optional. Yet, this code seems to assume `Ext0` and `Ext1` aren't null. Does that mean that these widen recipes are always valid, but sometimes they represent an "identity" transformation? https://github.com/llvm/llvm-project/pull/136997 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)
@@ -2586,22 +2590,21 @@ class VPMulAccumulateReductionRecipe : public VPReductionRecipe { VPValue *getVecOp1() const { return getOperand(2); } /// Return if this MulAcc recipe contains extend instructions. - bool isExtended() const { return ExtOp != Instruction::CastOps::CastOpsEnd; } + bool isExtended() const { gbossu wrote: It's just that in other places of the code, I think there is an assumption that `isExtended()` is equivalent to `ZExt || SExt` while there are other types of`CastOps` like "FP to Int". Please ignore me, this is a very pedantic comment ;) https://github.com/llvm/llvm-project/pull/136997 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)
@@ -2586,22 +2590,21 @@ class VPMulAccumulateReductionRecipe : public VPReductionRecipe { VPValue *getVecOp1() const { return getOperand(2); } /// Return if this MulAcc recipe contains extend instructions. - bool isExtended() const { return ExtOp != Instruction::CastOps::CastOpsEnd; } + bool isExtended() const { +return getVecOp0Info().ExtOp != Instruction::CastOps::CastOpsEnd; gbossu wrote: But could it happen that Op0 is not extended, and Op1 is? (Probably a stupid question because I'm reading this code without prior knowledge about `VPlan` stuff 😄) https://github.com/llvm/llvm-project/pull/136997 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][ISel] Select constructive EXT_ZZZI pseudo instruction (PR #152554)
https://github.com/gbossu edited https://github.com/llvm/llvm-project/pull/152554 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][SME] Support agnostic ZA functions in the MachineSMEABIPass (PR #149064)
@@ -250,6 +286,9 @@ struct MachineSMEABI : public MachineFunctionPass { SmallVector BundleStates; gbossu wrote: We are starting to accumulate a lot of state, which makes the code harder to follow as it allows any member function to modify it instead of having clear ins/outs. I know we discussed that already [here](https://github.com/llvm/llvm-project/pull/149062#discussion_r2276758787), but I feel it would be nice not to delay the refactoring too much. Even having a first step that collects all the info in a struct would help. We could then pass that info around by const ref to any function that needs it. If some info needs to be mutable, then it should not be in the struct, and be a clear in/out parameter. Doing something like this would clearly decouple the "collection" phase from the "let me correctly handle the state changes" phase. https://github.com/llvm/llvm-project/pull/149064 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][SME] Support agnostic ZA functions in the MachineSMEABIPass (PR #149064)
@@ -250,6 +286,9 @@ struct MachineSMEABI : public MachineFunctionPass { SmallVector BundleStates; std::optional TPIDR2Block; std::optional AfterSMEProloguePt; +Register AgnosticZABufferPtr = AArch64::NoRegister; +LiveRegs PhysLiveRegsAfterSMEPrologue = LiveRegs::None; +bool HasFullZASaveRestore = false; gbossu wrote: Could you document what a "full ZA save/restore" is? How does this differ from the "plain" save/restore? https://github.com/llvm/llvm-project/pull/149064 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][SME] Support agnostic ZA functions in the MachineSMEABIPass (PR #149064)
@@ -200,7 +200,7 @@ struct MachineSMEABI : public MachineFunctionPass { /// Inserts code to handle changes between ZA states within the function. /// E.g., ACTIVE -> LOCAL_SAVED will insert code required to save ZA. - void insertStateChanges(); + void insertStateChanges(bool IsAgnosticZA); gbossu wrote: Also, are `IsAgnosticZA` and `!IsAgnosticZA` the two types of functions we'll ever have have to handle? If not, should we already turn this into an enum? https://github.com/llvm/llvm-project/pull/149064 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits