Issue 181514
Summary [AArch64] manual deinterleaving `ld2` not recognized
Labels new issue
Assignees
Reporter folkertdev
    A manual 16-bit ld4 (so normal load, then deinterleave with a shuffle) is recognized, and lowered as `ld4`. The same is for some odd reason not true for `ld2`, where more instructions are used.

https://godbolt.org/z/danjGfMb9

```asm
manual2:
        ldr q1, [x0]
        ext     v2.16b, v1.16b, v1.16b, #8
        uzp1    v0.4h, v1.4h, v2.4h
        uzp2    v1.4h, v1.4h, v2.4h
        ret

intrin2:
 ld2     { v0.4h, v1.4h }, [x0]
        ret

manual4:
        ld4 { v0.4h, v1.4h, v2.4h, v3.4h }, [x0]
        stp     d0, d1, [x8]
 stp     d2, d3, [x8, #16]
        ret

intrin4:
        ld4     { v0.4h, v1.4h, v2.4h, v3.4h }, [x0]
        stp     d0, d1, [x8]
        stp d2, d3, [x8, #16]
        ret
```

The issue is that the `VectorCombinePass` turns

```llvm
  %0 = shufflevector <8 x i16> %tmp.sroa.0.0.copyload.i, <8 x i16> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
  %1 = shufflevector <8 x i16> %tmp.sroa.0.0.copyload.i, <8 x i16> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
  %2 = bitcast <4 x i16> %0 to <8 x i8>
  %3 = bitcast <4 x i16> %1 to <8 x i8>
```

into

```llvm
  %0 = bitcast <8 x i16> %tmp.sroa.0.0.copyload.i to <16 x i8>
  %1 = shufflevector <16 x i8> %0, <16 x i8> poison, <8 x i32> <i32 0, i32 1, i32 4, i32 5, i32 8, i32 9, i32 12, i32 13>
  %2 = bitcast <8 x i16> %tmp.sroa.0.0.copyload.i to <16 x i8>
  %3 = shufflevector <16 x i8> %2, <16 x i8> poison, <8 x i32> <i32 2, i32 3, i32 6, i32 7, i32 10, i32 11, i32 14, i32 15>
```

that presumably breaks the `ld2` pattern recognition.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to