Issue 179013
Summary [AArch64] vextq_u8 expands into two EXT instructions in some cases
Labels new issue
Assignees
Reporter zeux
    When compiling the attached file with -O2/-O3 for AArch 64 ([test.cpp](https://github.com/user-attachments/files/24977466/test.cpp)), LLVM generates a loop that has 3 EXT instructions; the second EXT in the code is expanded into a two-EXT sequence before the store:

```asm
        ext v3.16b, v2.16b, v2.16b, #8
        ext     v1.8b, v1.8b, v3.8b, #7
 str     d1, [x0], #8
```

This is new as of LLVM 20; LLVM 19 generated one EXT instead:

```asm
        ext     v2.16b, v1.16b, v1.16b, #7
 str     d2, [x0], #8
```

I'm not sure to what extent this affects performance on my larger code out of which this repro was extracted; llvm-mca claims that the loop in test.cpp gets 2 cycles slower (4.2 => 6.2). The instruction appears to be entirely redundant.

Replacing `vst1_u8` in the code with `vst1q_lane_u64` (with appropriate casts) seems to work around the issue, although it generates a differently flavored store so I'm not sure if it has other consequences.

Godbolt link for ease of experimentation: https://gcc.godbolt.org/z/4q8n5qbzK
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to