https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85519
--- Comment #1 from Tom de Vries <vries at gcc dot gnu.org> --- (In reply to Tom de Vries from comment #0) > All these solutions work until the next failure shows up. It would be nice > to fix this more definitely in some way, but I'm not sure how. We could try to figure out the frame size of the recursive function. Using GOMP_DEBUG=1 we see the JIT compile/link log: ... Link log warning : Stack size for entry function 'main$_omp_fn$0' cannot be statically determined info : 0 bytes gmem info : Function properties for 'main$_omp_fn$0': info : used 8 registers, 0 stack, 0 bytes smem, 328 bytes cmem[0], 0 bytes lmem ... but the stack size is only shown for the offloading region, not for individual functions. Using GOMP_NVPTX_SAVE_TEMPS=1 we could get the cubin, and dump the resource usage: ... $ cuobjdump -res-usage gomp-nvptx.*.cubin Resource usage: Common: GLOBAL:0 Function rec: REG:8 STACK:0 SHARED:0 LOCAL:0 TEXTURE:0 SURFACE:0 SAMPLER:0 Function main$_omp_fn$0: REG:8 STACK:UNKNOWN SHARED:0 LOCAL:0 CONSTANT[0]:328 TEXTURE:0 SURFACE:0 SAMPLER:0 ... but the STACK entry for rec shows up as 0. Finally, using nvdisasm (or GOMP_NVPTX_DISASM=1) we find the info: ... $ nvdisasm gomp-nvptx.*.cubin //----- nvinfo : EIATTR_FRAME_SIZE .align 4 /*0000*/ .byte 0x04, 0x11 /*0002*/ .short (.L_6 - .L_5) .align 4 .L_5: /*0004*/ .word index@(rec) /*0008*/ .word 0x00000010 <<<<<<<< //----- nvinfo : EIATTR_FRAME_SIZE .align 4 .L_6: /*000c*/ .byte 0x04, 0x11 /*000e*/ .short (.L_8 - .L_7) .align 4 .L_7: /*0010*/ .word index@(main$_omp_fn$0) /*0014*/ .word 0x00000000 //----- nvinfo : EIATTR_MIN_STACK_SIZE .align 4 .L_8: /*0018*/ .byte 0x04, 0x12 /*001a*/ .short (.L_10 - .L_9) .align 4 .L_9: /*001c*/ .word index@(main$_omp_fn$0) /*0020*/ .word 0xffffffff .L_10: ... So, we could write some tcl function to get the frame size for a function, and xfail or skip the test if the frame size is bigger that given constant x, but AFAIK dejagnu is not setup for this. The best we could do is to add a dg-final check and emit a: ... PASS: rec.c dg-nvptx-frame-size-check main$_omp_fn$0 0 FAIL: rec.c dg-nvptx-frame-size-check rec 8 ... Or, going for a more precise check: ... FAIL: rec.c dg-nvptx-stack-size-check main$_omp_fn$0,rec=65 (peak stack size 1048 is larger than stack size limit 1024) ... where you then check that frame-size (main$_omp_fn$0) + 65 * frame-size (rec) < udaThreadGetLimit(&size, cudaLimitStackSize)). Presumably formulating the peak stack composition gets more involved with openmp test cases which have a more complicated call stack.