[Bug target/85519] [nvptx, openacc, openmp, testsuite] Recursive tests may fail due to thread stack limit

2018-05-02 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85519

Tom de Vries  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Tom de Vries  ---
(In reply to Tom de Vries from comment #4)
> Committed to trunk.
> 
> Approved for 8.2. [ 8.1 release is targeted for Wednesday, May 2nd. ]

Backported to gcc-8-branch after 8.1 release.

Marking resolved-fixed.

[Bug target/85519] [nvptx, openacc, openmp, testsuite] Recursive tests may fail due to thread stack limit

2018-05-02 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85519

--- Comment #6 from Tom de Vries  ---
Author: vries
Date: Wed May  2 10:55:07 2018
New Revision: 259834

URL: https://gcc.gnu.org/viewcvs?rev=259834=gcc=rev
Log:
backport "[nvptx, libgomp, testsuite] Reduce recursion depth in
declare_target-{1,2}.f90"

2018-05-02  Tom de Vries  

backport from trunk:
2018-04-26  Tom de Vries  

PR target/85519
* testsuite/libgomp.fortran/examples-4/declare_target-1.f90: Reduce
recursion depth from 25 to 23.
* testsuite/libgomp.fortran/examples-4/declare_target-2.f90: Same.

Modified:
branches/gcc-8-branch/libgomp/ChangeLog
   
branches/gcc-8-branch/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
   
branches/gcc-8-branch/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90

[Bug target/85519] [nvptx, openacc, openmp, testsuite] Recursive tests may fail due to thread stack limit

2018-05-01 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85519

Tom de Vries  changed:

   What|Removed |Added

 CC||cesar at gcc dot gnu.org

--- Comment #5 from Tom de Vries  ---
*** Bug 84871 has been marked as a duplicate of this bug. ***

[Bug target/85519] [nvptx, openacc, openmp, testsuite] Recursive tests may fail due to thread stack limit

2018-04-26 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85519

--- Comment #4 from Tom de Vries  ---
Committed to trunk.

Approved for 8.2. [ 8.1 release is targeted for Wednesday, May 2nd. ]

[Bug target/85519] [nvptx, openacc, openmp, testsuite] Recursive tests may fail due to thread stack limit

2018-04-26 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85519

--- Comment #3 from Tom de Vries  ---
Author: vries
Date: Thu Apr 26 13:26:09 2018
New Revision: 259674

URL: https://gcc.gnu.org/viewcvs?rev=259674=gcc=rev
Log:
[nvptx, libgomp, testsuite] Reduce recursion depth in declare_target-{1,2}.f90

2018-04-26  Tom de Vries  

PR target/85519
* testsuite/libgomp.fortran/examples-4/declare_target-1.f90: Reduce
recursion depth from 25 to 23.
* testsuite/libgomp.fortran/examples-4/declare_target-2.f90: Same.

Modified:
trunk/libgomp/ChangeLog
trunk/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
trunk/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90

[Bug target/85519] [nvptx, openacc, openmp, testsuite] Recursive tests may fail due to thread stack limit

2018-04-25 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85519

Tom de Vries  changed:

   What|Removed |Added

   Keywords||openacc, openmp, patch
   Severity|normal  |trivial

--- Comment #2 from Tom de Vries  ---
https://gcc.gnu.org/ml/gcc-patches/2018-04/msg01122.html

[Bug target/85519] [nvptx, openacc, openmp, testsuite] Recursive tests may fail due to thread stack limit

2018-04-25 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85519

--- Comment #1 from Tom de Vries  ---
(In reply to Tom de Vries from comment #0)
> All these solutions work until the next failure shows up. It would be nice
> to fix this more definitely in some way, but I'm not sure how.

We could try to figure out the frame size of the recursive function.

Using GOMP_DEBUG=1 we see the JIT compile/link log:
...
Link log warning : Stack size for entry function 'main$_omp_fn$0' cannot be
statically determined
info: 0 bytes gmem
info: Function properties for 'main$_omp_fn$0':
info: used 8 registers, 0 stack, 0 bytes smem, 328 bytes cmem[0], 0 bytes
lmem 
...
but the stack size is only shown for the offloading region, not for individual
functions.

Using GOMP_NVPTX_SAVE_TEMPS=1 we could get the cubin, and dump the resource
usage:
...
$ cuobjdump -res-usage gomp-nvptx.*.cubin  

Resource usage:
 Common:
  GLOBAL:0
 Function rec:
  REG:8 STACK:0 SHARED:0 LOCAL:0 TEXTURE:0 SURFACE:0 SAMPLER:0
 Function main$_omp_fn$0:
  REG:8 STACK:UNKNOWN SHARED:0 LOCAL:0 CONSTANT[0]:328 TEXTURE:0 SURFACE:0
SAMPLER:0
...
but the STACK entry for rec shows up as 0.

Finally, using nvdisasm (or GOMP_NVPTX_DISASM=1) we find the info:
...
$ nvdisasm gomp-nvptx.*.cubin
//- nvinfo : EIATTR_FRAME_SIZE
.align  4
/**/.byte   0x04, 0x11
/*0002*/.short  (.L_6 - .L_5)
.align  4
.L_5:
/*0004*/.word   index@(rec)
/*0008*/.word   0x0010 


//- nvinfo : EIATTR_FRAME_SIZE
.align  4
.L_6:
/*000c*/.byte   0x04, 0x11
/*000e*/.short  (.L_8 - .L_7)
.align  4
.L_7:
/*0010*/.word   index@(main$_omp_fn$0)
/*0014*/.word   0x


//- nvinfo : EIATTR_MIN_STACK_SIZE
.align  4
.L_8:
/*0018*/.byte   0x04, 0x12
/*001a*/.short  (.L_10 - .L_9)
.align  4
.L_9:
/*001c*/.word   index@(main$_omp_fn$0)
/*0020*/.word   0x
.L_10:
...


So, we could write some tcl function to get the frame size for a function, and
xfail or skip the test if the frame size is bigger that given constant x, but
AFAIK dejagnu is not setup for this. The best we could do is to add a dg-final
check and emit a:
...
PASS: rec.c dg-nvptx-frame-size-check main$_omp_fn$0 0
FAIL: rec.c dg-nvptx-frame-size-check rec 8
...


Or, going for a more precise check:
...
FAIL: rec.c dg-nvptx-stack-size-check main$_omp_fn$0,rec=65 (peak stack size
1048 is larger than stack size limit 1024)
...
where you then check that frame-size (main$_omp_fn$0) + 65 * frame-size (rec) <
udaThreadGetLimit(, cudaLimitStackSize)).

Presumably formulating the peak stack composition gets more involved with
openmp test cases which have a more complicated call stack.