https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81352

            Bug ID: 81352
           Summary: [nvptx] trap placement for non-returning function call
                    in vector-single mode
           Product: gcc
           Version: 7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Consider the following fortran test-case:
...
program foo
  integer :: a(2,2,4), k, kk, kkk
  a = 2

  !$acc parallel num_gangs(1) num_workers(1) vector_length(32)                  

  !$acc loop gang(static:1)                                                     
  do k=1,2
     if (any(a(k,1:2,1:4).ne.2)) call abort
  enddo
  !$acc end parallel                                                            

end program
...

We generate the following ptx for -O2:
....
        {
                .reg .u32 %x;
                mov.u32 %x,%tid.x;
                setp.ne.u32 %r47,%x,0;
        }

        ...

        @ %r47 bra $L12;
        {
                call _gfortran_abort;
                trap;
        }
$L12:
$L3:
...

which results in this SASS generated with -O4:
...
        /*01f8*/                   ISETP.NE.U32.AND P0, PT, R21, RZ, PT;
        /*0208*/                   SSY `(.L_25);
        /*0210*/               @P0 BRA `(.L_26);
        /*0218*/                   JCAL `(_gfortran_abort);
        /*0220*/                   BPT.TRAP 0x1;
.L_26:
        /*0228*/                   NOP.S;
.L_25:
...

We have to think of the diverging branch '@P0 BRA `(.L_26)' as executing both
paths, one after the other, with different threads in the warp enabled, without
any guarantee which path will be executed first. 

The paths end at an instruction with the .s prefix, and after the execution of
the second path, converged execution is resumed at the target of the SSY insn:
.L_25.

So either we execute this sequence:
1. branch taken path first:
        /*0208*/                   SSY `(.L_25);
        /*0210*/               @P0 BRA `(.L_26);
        /*0228*/                   NOP.S;
        /*0218*/                   JCAL `(_gfortran_abort);

or this one:
2. branch not taken path first:
        /*0208*/                   SSY `(.L_25);
        /*0210*/               @P0 BRA `(.L_26);
        /*0218*/                   JCAL `(_gfortran_abort);

In both cases, we reliably end up at _gfortran_abort at runtime. So, the
generated code is not wrong.

But, it looks fragile to me. What guarantees us that a sync point is in fact
generated? That's not so clear, given that there's no clear post-domination
point, because one of the paths ends in a trap.  And if no sync point is
generated, execution can fall through and execute random code after $L12.

Furthermore, while the trap placement right after the _gfortran_abort call is
accurate in the sense that that's the function call not returning, so that's
where the control flow barrier is, it's clear that the code does not intend to
execute past $L3 in any of the other threads in the warp.

So, the more clear (and possibly more reliable trap) placement is this:
...
        @ %r47 bra $L12;
        {
                call _gfortran_abort;
        }
$L12:
                trap;
$L3:
...

Resulting in this SASS code:
...
        /*01f8*/                   ISETP.NE.U32.AND P0, PT, R21, RZ, PT;
        /*0208*/                   SSY `(.L_26);
        /*0210*/               @P0 BRA `(.L_27);
        /*0218*/                   JCAL `(_gfortran_abort);
.L_27:
        /*0220*/                   NOP.S;
.L_26:
        /*0228*/                   BPT.TRAP 0x1;
...

Note that the trap is placed after the sync pont .L_26.

Reply via email to