Issue 176406
Summary [AMDGPU] Make better use of v_perm_b32 for transpose-like operations on multiple dwords
Labels enhancement, backend:AMDGPU
Assignees jayfoad
Reporter jayfoad
    [ideal_perm.txt](https://github.com/user-attachments/files/24675043/ideal_perm.txt) shows desired codegen for a transpose-like operation on 16 bytes packed into 4 dwords.

[perm.txt](https://github.com/user-attachments/files/24675044/perm.txt) and [good_shuffle_perm.txt](https://github.com/user-attachments/files/24675045/good_shuffle_perm.txt) are attempts to represent this in IR, first with extractelement/insertelement and then with shufflevector. In both cases the compiler fails to make use of v_perm_b32 and instead generates a longer sequence of shifts and ORs with SDWA e.g.:
```
$ llc -mtriple=amdgcn -mcpu=gfx900 perm.ll -o -
...
	v_lshlrev_b16_e32 v16, 8, v16
	v_or_b32_sdwa v8, v8, v16 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
	v_lshlrev_b16_e32 v16, 8, v20
	v_or_b32_sdwa v12, v12, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
	v_or_b32_sdwa v8, v8, v12 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD
	v_lshlrev_b16_e32 v12, 8, v18
	v_or_b32_sdwa v10, v10, v12 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
	v_lshlrev_b16_e32 v12, 8, v22
	v_or_b32_sdwa v12, v14, v12 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
	v_or_b32_sdwa v10, v10, v12 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD
	v_lshlrev_b16_e32 v12, 8, v17
	v_or_b32_sdwa v9, v9, v12 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
	v_lshlrev_b16_e32 v12, 8, v21
	v_or_b32_sdwa v12, v13, v12 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
	v_or_b32_sdwa v9, v9, v12 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD
	v_lshlrev_b16_e32 v12, 8, v19
	v_or_b32_sdwa v11, v11, v12 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
	v_lshlrev_b16_e32 v12, 8, v23
	v_or_b32_sdwa v12, v15, v12 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
	v_or_b32_sdwa v11, v11, v12 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD
...
```
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to