https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81352
Bug ID: 81352 Summary: [nvptx] trap placement for non-returning function call in vector-single mode Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- Consider the following fortran test-case: ... program foo integer :: a(2,2,4), k, kk, kkk a = 2 !$acc parallel num_gangs(1) num_workers(1) vector_length(32) !$acc loop gang(static:1) do k=1,2 if (any(a(k,1:2,1:4).ne.2)) call abort enddo !$acc end parallel end program ... We generate the following ptx for -O2: .... { .reg .u32 %x; mov.u32 %x,%tid.x; setp.ne.u32 %r47,%x,0; } ... @ %r47 bra $L12; { call _gfortran_abort; trap; } $L12: $L3: ... which results in this SASS generated with -O4: ... /*01f8*/ ISETP.NE.U32.AND P0, PT, R21, RZ, PT; /*0208*/ SSY `(.L_25); /*0210*/ @P0 BRA `(.L_26); /*0218*/ JCAL `(_gfortran_abort); /*0220*/ BPT.TRAP 0x1; .L_26: /*0228*/ NOP.S; .L_25: ... We have to think of the diverging branch '@P0 BRA `(.L_26)' as executing both paths, one after the other, with different threads in the warp enabled, without any guarantee which path will be executed first. The paths end at an instruction with the .s prefix, and after the execution of the second path, converged execution is resumed at the target of the SSY insn: .L_25. So either we execute this sequence: 1. branch taken path first: /*0208*/ SSY `(.L_25); /*0210*/ @P0 BRA `(.L_26); /*0228*/ NOP.S; /*0218*/ JCAL `(_gfortran_abort); or this one: 2. branch not taken path first: /*0208*/ SSY `(.L_25); /*0210*/ @P0 BRA `(.L_26); /*0218*/ JCAL `(_gfortran_abort); In both cases, we reliably end up at _gfortran_abort at runtime. So, the generated code is not wrong. But, it looks fragile to me. What guarantees us that a sync point is in fact generated? That's not so clear, given that there's no clear post-domination point, because one of the paths ends in a trap. And if no sync point is generated, execution can fall through and execute random code after $L12. Furthermore, while the trap placement right after the _gfortran_abort call is accurate in the sense that that's the function call not returning, so that's where the control flow barrier is, it's clear that the code does not intend to execute past $L3 in any of the other threads in the warp. So, the more clear (and possibly more reliable trap) placement is this: ... @ %r47 bra $L12; { call _gfortran_abort; } $L12: trap; $L3: ... Resulting in this SASS code: ... /*01f8*/ ISETP.NE.U32.AND P0, PT, R21, RZ, PT; /*0208*/ SSY `(.L_26); /*0210*/ @P0 BRA `(.L_27); /*0218*/ JCAL `(_gfortran_abort); .L_27: /*0220*/ NOP.S; .L_26: /*0228*/ BPT.TRAP 0x1; ... Note that the trap is placed after the sync pont .L_26.