================
@@ -6886,18 +6896,24 @@
SITargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
ST.getGeneration() >= AMDGPUSubtarget::GFX12
? AMDGPU::V_ADD_F64_pseudo_e64
: AMDGPU::V_ADD_F64_e64);
+ case AMDGPU::WAVE_REDUCE_AND_PSEUDO_B16_t16:
+ return lowerWaveReduce(MI, *BB, *getSubtarget(),
AMDGPU::V_AND_B16_t16_e64);
case AMDGPU::WAVE_REDUCE_AND_PSEUDO_B16:
return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::S_AND_B32);
case AMDGPU::WAVE_REDUCE_AND_PSEUDO_B32:
return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::S_AND_B32);
case AMDGPU::WAVE_REDUCE_AND_PSEUDO_B64:
return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::S_AND_B64);
+ case AMDGPU::WAVE_REDUCE_OR_PSEUDO_B16_t16:
+ return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::V_OR_B16_t16_e64);
case AMDGPU::WAVE_REDUCE_OR_PSEUDO_B16:
return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::S_OR_B32);
case AMDGPU::WAVE_REDUCE_OR_PSEUDO_B32:
return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::S_OR_B32);
case AMDGPU::WAVE_REDUCE_OR_PSEUDO_B64:
return lowerWaveReduce(MI, *BB, *getSubtarget(), AMDGPU::S_OR_B64);
+ case AMDGPU::WAVE_REDUCE_XOR_PSEUDO_B16_t16:
+ return lowerWaveReduce(MI, *BB, *getSubtarget(),
AMDGPU::V_XOR_B16_t16_e64);
----------------
Sisyph wrote:
In general it should be better to use 32-bit SALU instructions than 16-bit VALU
instructions (more efficient and vgprs are generally more precious than sgprs I
think). In more complicated cases things like sgpr pressure or downstream
effects of promoting the types may make it less clear.
https://github.com/llvm/llvm-project/pull/194813
_______________________________________________
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits