[Bug libgomp/83589] [nvptx] mode-transitions.c and private-variables.{c,f90} execution FAILs

2018-01-18 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83589

--- Comment #1 from Tom de Vries  ---
Using the patch from PR83920 comment 3 and testing libgomp.oacc-c/c.exp makes
the  libgomp.oacc-c/c.exp failures of this PR go away.

So, this might be a duplicate.

[Bug libgomp/83589] [nvptx] mode-transitions.c and private-variables.{c,f90} execution FAILs at GOMP_NVPTX_JIT=-O0

2018-01-19 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83589

--- Comment #2 from Tom de Vries  ---
I've minimized mode-transitions.c to:
...
#define n 32

int
main (void)
{
  int arr_a[n];

#pragma acc parallel copyout(arr_a) num_gangs(1) num_workers(1)
vector_length(32)
  {
#pragma acc loop vector
for (int m = 0; m < 32; m++)
  ;

#pragma acc loop vector
for (int m = 0; m < 32; m++)
  arr_a[m] = 0;
  }
}
...

and the ptx to:
...
.version 3.1
.target sm_30
.address_size 64
.entry main$_omp_fn$0 (.param .u64 %in_ar0);
.entry main$_omp_fn$0 (.param .u64 %in_ar0)
{
  .reg .u64 %ar0;
  ld.param.u64 %ar0,[%in_ar0];

  .reg .pred %r36;
  {
.reg .u32 %x;
mov.u32 %x,%tid.x;
setp.ne.u32 %r36,%x,0;
  }

  .reg .u64 %r26;
  mov.u64 %r26,%ar0;

  @ %r36 bra $L5;
  $L5:

  {
.reg .u32 %r32;
.reg .u32 %r33;
mov.b64 {%r32,%r33},%r26;
shfl.idx.b32 %r32,%r32,0,31;
shfl.idx.b32 %r33,%r33,0,31;
mov.b64 %r26,{%r32,%r33};
  }

  ld.u64 %r26,[%r26];

  @ %r36 bra $L6;
  st.u32 [%r26],0;
 $L6:

  ret;
}
...

Either removing:
- the broad cast bit, which is an identity operation, or
- the redundant branch to $L5
make the test pass.

This looks like another nvidia driver problem (with driver version 384.111).

[Bug libgomp/83589] [nvptx] mode-transitions.c and private-variables.{c,f90} execution FAILs at GOMP_NVPTX_JIT=-O0

2018-01-19 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83589

--- Comment #3 from Tom de Vries  ---
(In reply to Tom de Vries from comment #2)
> This looks like another nvidia driver problem (with driver version 384.111).

Confirmed.

The empty branch

>  @ %r36 bra $L5;
>  $L5:

is translated into:
...
/*0128*/   @P0 BRA `(.L_1);
.L_1:
...
so, no sync after the branch (or ssy before the branch).

Consequently, when executing the shfl.idx a bit later:
...
/*0158*/   SHFL.IDX PT, R0, R0, RZ, 0x1f;
/*0168*/   SHFL.IDX PT, R2, R2, RZ, 0x1f;
...
we are in divergent mode and get undefined results.

Inserting some sort of nop in the branched-around part:
...
  @ %r36 bra $L5;
{
  .reg .u32 %nop_src;
  .reg .u32 %nop_dst;
  mov.u32 %nop_dst, %nop_src;
}
  $L5:
...
makes the test pass, because then we generate:
...
/*0128*/   SSY `(.L_1);
/*0130*/   @P0 SYNC (*"TARGET= .L_1 "*);
/*0138*/   SYNC (*"TARGET= .L_1 "*);
.L_1:
...

[Bug libgomp/83589] [nvptx] mode-transitions.c and private-variables.{c,f90} execution FAILs at GOMP_NVPTX_JIT=-O0

2018-01-20 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83589

--- Comment #4 from Tom de Vries  ---
Using this rudimentary workaround, I got the failing tests of this PR passing
again:
...
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index afb0e4dd185..3ac28b3d903 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -78,6 +78,7 @@
 #include "target-def.h"

 #define WORKAROUND_PTXJIT_BUG 1
+#define WORKAROUND_PTXJIT_BUG_2 1

 /* The various PTX memory areas an object might reside in.  */
 enum nvptx_data_area
@@ -4431,6 +4432,12 @@ nvptx_reorg (void)
   if (TARGET_UNIFORM_SIMT)
 nvptx_reorg_uniform_simt ();

+#if WORKAROUND_PTXJIT_BUG_2
+  for (rtx_insn *insn = get_insns (); insn; insn = NEXT_INSN (insn))
+if (LABEL_P (insn))
+  emit_insn_before (gen_fake_nop (), insn);
+#endif
+
   regstat_free_n_sets_and_refs ();

   df_finish_pass (true);
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index f9c087b6d22..909484c329a 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -994,6 +994,15 @@
   ""
   "")

+(define_insn "fake_nop"
+  [(const_int 1)]
+  ""
+  "{
+ .reg .u32 %%nop_src;
+ .reg .u32 %%nop_dst;
+ mov.u32 %%nop_dst, %%nop_src;
+   }")
+
 (define_insn "return"
   [(return)]
   ""
...

[Bug libgomp/83589] [nvptx] mode-transitions.c and private-variables.{c,f90} execution FAILs at GOMP_NVPTX_JIT=-O0

2018-01-20 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83589

--- Comment #5 from Tom de Vries  ---
Using the workaround, I get pretty good results:
...
Running /home/vries/openacc/trunk/source-gcc/libgomp/testsuite/libgomp.c/c.exp
...
FAIL: libgomp.c/target-32.c (test for excess errors)
FAIL: libgomp.c/target-33.c execution test
FAIL: libgomp.c/target-34.c execution test
FAIL: libgomp.c/target-link-1.c (test for excess errors)
FAIL: libgomp.c/thread-limit-2.c (test for excess errors)
Running
/home/vries/openacc/trunk/source-gcc/libgomp/testsuite/libgomp.c++/c++.exp ...
Running
/home/vries/openacc/trunk/source-gcc/libgomp/testsuite/libgomp.fortran/fortran.exp
...
FAIL: libgomp.fortran/target2.f90   -O0  execution test
FAIL: libgomp.fortran/target2.f90   -O1  execution test
Running
/home/vries/openacc/trunk/source-gcc/libgomp/testsuite/libgomp.graphite/graphite.exp
...
Running
/home/vries/openacc/trunk/source-gcc/libgomp/testsuite/libgomp.hsa.c/c.exp ...
Running
/home/vries/openacc/trunk/source-gcc/libgomp/testsuite/libgomp.oacc-c/c.exp ...
Running
/home/vries/openacc/trunk/source-gcc/libgomp/testsuite/libgomp.oacc-c++/c++.exp
...
Running
/home/vries/openacc/trunk/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
...

=== libgomp Summary ===

# of expected passes8731
# of unexpected failures7
# of unresolved testcases   3
# of unsupported tests  240
...

[Bug libgomp/83589] [nvptx] mode-transitions.c and private-variables.{c,f90} execution FAILs at GOMP_NVPTX_JIT=-O0

2018-01-20 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83589

--- Comment #6 from Jakub Jelinek  ---
(In reply to Tom de Vries from comment #4)
> Using this rudimentary workaround, I got the failing tests of this PR
> passing again:
> ...
> diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
> index afb0e4dd185..3ac28b3d903 100644
> --- a/gcc/config/nvptx/nvptx.c
> +++ b/gcc/config/nvptx/nvptx.c
> @@ -78,6 +78,7 @@
>  #include "target-def.h"
>  
>  #define WORKAROUND_PTXJIT_BUG 1
> +#define WORKAROUND_PTXJIT_BUG_2 1
>  
>  /* The various PTX memory areas an object might reside in.  */
>  enum nvptx_data_area
> @@ -4431,6 +4432,12 @@ nvptx_reorg (void)
>if (TARGET_UNIFORM_SIMT)
>  nvptx_reorg_uniform_simt ();
>  
> +#if WORKAROUND_PTXJIT_BUG_2
> +  for (rtx_insn *insn = get_insns (); insn; insn = NEXT_INSN (insn))
> +if (LABEL_P (insn))
> +  emit_insn_before (gen_fake_nop (), insn);
> +#endif
> +
>regstat_free_n_sets_and_refs ();
>  
>df_finish_pass (true);
> diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
> index f9c087b6d22..909484c329a 100644
> --- a/gcc/config/nvptx/nvptx.md
> +++ b/gcc/config/nvptx/nvptx.md
> @@ -994,6 +994,15 @@
>""
>"")
>  
> +(define_insn "fake_nop"
> +  [(const_int 1)]
> +  ""
> +  "{
> + .reg .u32 %%nop_src;
> + .reg .u32 %%nop_dst;
> + mov.u32 %%nop_dst, %%nop_src;
> +   }")
> +
>  (define_insn "return"
>[(return)]
>""
> ...

Shouldn't it be sufficient to emit this only for JUMP_Ps that jump to
immediately following LABEL_Ps (without intervening non-note insns)?

[Bug libgomp/83589] [nvptx] mode-transitions.c and private-variables.{c,f90} execution FAILs at GOMP_NVPTX_JIT=-O0

2018-01-20 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83589

--- Comment #7 from Tom de Vries  ---
(In reply to Jakub Jelinek from comment #6)
> (In reply to Tom de Vries from comment #4)
> > Using this rudimentary workaround, I got the failing tests of this PR
> > passing again:

> Shouldn't it be sufficient to emit this only for JUMP_Ps that jump to
> immediately following LABEL_Ps (without intervening non-note insns)?

The emitted code actually looks like:
...
@ %r36 bra $L5;
// join x;
// fork x;
$L5:
...
so we'll have to skip over this pattern as well (given that the join are fork
are individual insns) .

But indeed, this is a rudimentary workaround, not a minimal one.

[Bug libgomp/83589] [nvptx] mode-transitions.c and private-variables.{c,f90} execution FAILs at GOMP_NVPTX_JIT=-O0

2018-01-22 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83589

--- Comment #8 from Tom de Vries  ---
Created attachment 43209
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43209&action=edit
Patch inserting fake_nop only in case of branch-around-nothing