On 29/01/2024 10:34, Tobias Burnus wrote:
Andrew wrote off list:
   "Vector reductions don't work on RDNA, as is, but they're
    supposed to be disabled by the insn condition"

This patch disables "fold_left_plus_<mode>", which is about
vectorization and in the code path shown in the backtrace.
I can also confirm manually that it fixes the ICE I saw and
also the ICE for the testfile that Richard's PR shows at the
end of his backtrace.  (-O3 is needed to trigger the ICE.)

OK for mainline?

OK.

Tobias

* * *

PS: We could add testcase(s) that is/are explicitly compiled with
gfx1100 and/or gfx1030 + '-O3' to ensure that this gets tested
with AMDGPU enabled, but I am not sure whether it is really worthwhile.


PPS: Running the testsuite, I see the following fails with
gfx1100 offloading:

FAIL: libgomp.c/../libgomp.c-c++-common/for-5.c (test for excess errors)
Excess errors:
/tmp/ccrsHfVQ.mkoffload.2.s:788736:27: error: value out of range
          .amdhsa_next_free_vgpr        516                                        ^~~ [Obviously, likewise forlibgomp.c++/../libgomp.c-c++-common/for-5.c] FAIL:libgomp.c/pr104783-2.c execution test FAIL:libgomp.c/pr104783.c execution test (The .log unfortunately does not show more details) FAIL:libgomp.fortran/optional-map.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors) FAIL:libgomp.fortran/optional-map.f90   -O3 -g  (test for excess errors) FAIL: libgomp.fortran/target1.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors) FAIL: libgomp.fortran/target1.f90   -O3 -g  (test for excess errors)Same 'out of range' as above. * * * Manual testing shows for the two execution fails: Memory access fault by GPU node-1 (Agent handle: 0x8d1aa0) on address (nil). Reason: Page not present or supervisor privilege. Interestingly, it only fails with -O1 or higher, for -O0 it works. Tobias

Hmm, supposedly there are 768 registers allocated in groups of 12, on gfx1100 (8 on other devices), which number you have to double on wavefrontsize64 because that field actually counts the number of 32-lane registers. The ISA can only actually reference 256 registers, so the limit here should be 512. (The remaining registers are intended for other wavefronts to use.)

But 256 is not divisible by 12, and it looks like we've rounded up. I guess we need to set the limit at 252 (504), for gfx1100.

for-5.c is a register allocation nightmare!

Andrew

Reply via email to