Re: [patch] gcn/gcn-valu.md: Disable fold_left_plus for TARGET_RDNA2_PLUS [PR113615]

Andrew Stubbs Mon, 29 Jan 2024 07:17:41 -0800

On 29/01/2024 12:50, Tobias Burnus wrote:

Andrew Stubbs wrote:
/tmp/ccrsHfVQ.mkoffload.2.s:788736:27: error: value out of range
.amdhsa_next_free_vgpr 516 ^~~ [Obviously, likewiseforlibgomp.c++/..
Hmm, supposedly there are 768 registers allocated in groups of 12, ongfx1100 (8 on other devices), which number you have to double onwavefrontsize64 because that field actually counts the number of32-lane registers. The ISA can only actually reference 256 registers,so the limit here should be 512. (The remaining registers are intendedfor other wavefronts to use.)
But 256 is not divisible by 12, and it looks like we've rounded up. Iguess we need to set the limit at 252 (504), for gfx1100.
BTW: The LLVM source code has,
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp#L1066

unsigned getTotalNumVGPRs(const MCSubtargetInfo *STI) {
   if (STI->getFeatureBits().test(FeatureGFX90AInsts))
     return 512;
   if (!isGFX10Plus(*STI))
     return 256;
   bool IsWave32 = STI->getFeatureBits().test(FeatureWavefrontSize32);
   if (STI->getFeatureBits().test(FeatureGFX11FullVGPRs))
     return IsWave32 ? 1536 : 768;
   return IsWave32 ? 1024 : 512;
}


That matches what we have in libgomp.

LLVM must have another configuration somewhere for how many registers itcan actually use in code (the ISA can encode 256, but that doesn't meanit should always do so). This may be a moot point because allowing toomany registers limits how many threads can run in parallel, so they mayhave chosen to impose an artificial limit at all times.

In GCC, non-kernel functions are limited to 24 registers (for maximumoccupancy -- we could probably increase that 50% on "GFX11Full"devices), but the kernel entry point is permitted to go crazy.


Andrew

Re: [patch] gcn/gcn-valu.md: Disable fold_left_plus for TARGET_RDNA2_PLUS [PR113615]

Reply via email to