https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104893

            Bug ID: 104893
           Summary: [nvptx] Handle Independent Thread Scheduling for
                    sm_70+ with -msoft-stack
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

We use -msoft-stack for openmp programs:
...
'-msoft-stack'
     Generate code that does not use '.local' memory directly for stack
     storage.  Instead, a per-warp stack pointer is maintained
     explicitly.  This enables variable-length stack allocation (with
     variable-length arrays or 'alloca'), and when global memory is used
     for underlying storage, makes it possible to access automatic
     variables from other threads, or with atomic instructions.
...

Starting with sm_70, we have Independent Thread Scheduling: "the GPU maintains
execution state per thread, including a program counter and call stack".

The per-thread call stack is handled for .local memory by the CUDA driver.

For the 'soft stack' that's not the case.  So, it's possible that different
threads start to read and write values to a stack address that is meant to be
thread private, but which in reality is shared between all threads in the warp.

Reply via email to