Re: [PATCH] i386: Enable AVX/AVX512 features only if supported by OSXSAVE

2018-03-30 Thread Ilya Verbin
2018-03-30 20:56 GMT+03:00 H.J. Lu <hjl.to...@gmail.com>:
> On Fri, Mar 30, 2018 at 10:19 AM, Ilya Verbin <iver...@gmail.com> wrote:
>> This check will always disable AVX-512 on macOS, because they
>> implemented on-demand support:
>> https://github.com/apple/darwin-xnu/blob/0a798f6738bc1db01281fc08ae024145e84df927/osfmk/i386/fpu.c#L176
>>
>
> Isn't xsaveopt designed for this?

Maybe the goal was to reduce the size of the area allocated by default
for each thread.

> --
> H.J.

  -- Ilya


Re: [PATCH] i386: Enable AVX/AVX512 features only if supported by OSXSAVE

2018-03-30 Thread Ilya Verbin
This check will always disable AVX-512 on macOS, because they
implemented on-demand support:
https://github.com/apple/darwin-xnu/blob/0a798f6738bc1db01281fc08ae024145e84df927/osfmk/i386/fpu.c#L176

(I'm not against this change, just for information).

2018-03-29 16:05 GMT+03:00 Uros Bizjak :
> On Thu, Mar 29, 2018 at 2:43 PM, H.J. Lu  wrote:
>> Enable AVX and AVX512 features only if their states are supported by
>> OSXSAVE.

  -- Ilya


Re: Import libcilkrts Build 4467 (PR target/68945)

2016-11-18 Thread Ilya Verbin
2016-11-17 20:01 GMT+03:00 Jeff Law <l...@redhat.com>:
> On 11/17/2016 09:56 AM, Ilya Verbin wrote:
>>
>> 2016-11-17 18:50 GMT+03:00 Rainer Orth <r...@cebitec.uni-bielefeld.de>:
>>>
>>> Rainer Orth <r...@cebitec.uni-bielefeld.de> writes:
>>>
>>>> I happened to notice that my libcilkrts SPARC port has been applied
>>>> upstream.  So to reach closure on this issue for the GCC 7 release, I'd
>>>> like to import upstream into mainline which seems to be covered by the
>>>> free-for-all clause in https://gcc.gnu.org/svnwrite.html#policies, even
>>>> though https://gcc.gnu.org/codingconventions.html#upstream lists nothing
>>>> specific and we have no listed maintainer.
>>>
>>>
>>> I initially used Ilya's intel.com address, which bounced.  Now using the
>>> current address listed in MAINTAINERS...
>>
>>
>> Yeah, I don't work for Intel anymore. And I'm not a libcilkrts
>> maintainer, so I can't approve it.
>
> Do you want to be?  IIRC I was going to nominate you, but held off knowing
> your situation was going to change.
>
> If you're interested in maintainer positions, I can certainly still nominate
> you.

I have little experience with this library, and no longer have a
connection with Cilk developers an Intel, so I'm not interested.

  -- Ilya


Re: Import libcilkrts Build 4467 (PR target/68945)

2016-11-17 Thread Ilya Verbin
2016-11-17 18:50 GMT+03:00 Rainer Orth :
> Rainer Orth  writes:
>
>> I happened to notice that my libcilkrts SPARC port has been applied
>> upstream.  So to reach closure on this issue for the GCC 7 release, I'd
>> like to import upstream into mainline which seems to be covered by the
>> free-for-all clause in https://gcc.gnu.org/svnwrite.html#policies, even
>> though https://gcc.gnu.org/codingconventions.html#upstream lists nothing
>> specific and we have no listed maintainer.
>
> I initially used Ilya's intel.com address, which bounced.  Now using the
> current address listed in MAINTAINERS...

Yeah, I don't work for Intel anymore. And I'm not a libcilkrts
maintainer, so I can't approve it.

  -- Ilya

>> The following patch has passed x86_64-pc-linux-gnu bootstrap without
>> regressions; i386-pc-solaris2.12 and sparc-sun-solaris2.12 bootstraps
>> are currently running.
>>
>> Ok for mainline if they pass?
>
> Both Solaris bootstraps have completed successfully now.
>
> Rainer
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH, AVX-512, committed] Fix detection of AVX512IFMA in host_detect_local_cpu

2016-08-31 Thread Ilya Verbin
Hi!

I've committed this patch as obvious.


gcc/
* config/i386/driver-i386.c (host_detect_local_cpu): Fix detection of
AVX512IFMA.


Index: gcc/config/i386/driver-i386.c
===
--- gcc/config/i386/driver-i386.c (revision 239907)
+++ gcc/config/i386/driver-i386.c (working copy)
@@ -498,7 +498,7 @@
   has_avx512dq = ebx & bit_AVX512DQ;
   has_avx512bw = ebx & bit_AVX512BW;
   has_avx512vl = ebx & bit_AVX512VL;
-  has_avx512vl = ebx & bit_AVX512IFMA;
+  has_avx512ifma = ebx & bit_AVX512IFMA;

   has_prefetchwt1 = ecx & bit_PREFETCHWT1;
   has_avx512vbmi = ecx & bit_AVX512VBMI;


  -- Ilya


[PATCH, i386, AVX-512ER] vrsqrt28ps auto generation

2016-06-20 Thread Ilya Verbin
Hi!

This patch emits vrsqrt28ps instruction in ix86_emit_swsqrtsf for recip case and
vrcp28ps(vrsqrt28ps(a)) for !recip.
Regtested using various benchmarks on a AVX-512ER machine.  OK for trunk?


gcc/
* config/i386/i386.c (ix86_emit_swsqrtsf): Emit vrsqrt28ps.
* config/i386/sse.md (define_expand "rsqrtv16sf2"): New.
gcc/testsuite/
* gcc.target/i386/avx512er-vrsqrt28ps-3.c: New test.
* gcc.target/i386/avx512er-vrsqrt28ps-4.c: New test.
* gcc.target/i386/avx512er-vrsqrt28ps-5.c: New test.
* gcc.target/i386/avx512er-vrsqrt28ps-6.c: New test.


diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 8e0bf26..edd3d23 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -48722,6 +48722,24 @@ void ix86_emit_swsqrtsf (rtx res, rtx a, machine_mode 
mode, bool recip)
   e2 = gen_reg_rtx (mode);
   e3 = gen_reg_rtx (mode);
 
+  if (TARGET_AVX512ER && mode == V16SFmode)
+{
+  if (recip)
+   /* res = rsqrt28(a) estimate */
+   emit_insn (gen_rtx_SET (res, gen_rtx_UNSPEC (mode, gen_rtvec (1, a),
+UNSPEC_RSQRT28)));
+  else
+   {
+ /* x0 = rsqrt28(a) estimate */
+ emit_insn (gen_rtx_SET (x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, a),
+ UNSPEC_RSQRT28)));
+ /* res = rcp28(x0) estimate */
+ emit_insn (gen_rtx_SET (res, gen_rtx_UNSPEC (mode, gen_rtvec (1, x0),
+  UNSPEC_RCP28)));
+   }
+  return;
+}
+
   real_from_integer (, VOIDmode, -3, SIGNED);
   mthree = const_double_from_real_value (r, SFmode);
 
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 6056ddc..c1ea04f 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1559,6 +1559,17 @@
   DONE;
 })
 
+(define_expand "rsqrtv16sf2"
+  [(set (match_operand:V16SF 0 "register_operand")
+   (unspec:V16SF
+ [(match_operand:V16SF 1 "vector_operand")]
+ UNSPEC_RSQRT28))]
+  "TARGET_SSE_MATH && TARGET_AVX512ER"
+{
+  ix86_emit_swsqrtsf (operands[0], operands[1], V16SFmode, true);
+  DONE;
+})
+
 (define_insn "_rsqrt2"
   [(set (match_operand:VF1_128_256 0 "register_operand" "=x")
(unspec:VF1_128_256
diff --git a/gcc/testsuite/gcc.target/i386/avx512er-vrsqrt28ps-3.c 
b/gcc/testsuite/gcc.target/i386/avx512er-vrsqrt28ps-3.c
new file mode 100644
index 000..1ba8172
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512er-vrsqrt28ps-3.c
@@ -0,0 +1,47 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx512er } */
+/* { dg-options "-O2 -ffast-math -ftree-vectorize -mavx512er" } */
+
+#include 
+#include "avx512er-check.h"
+
+#define MAX 1000
+#define EPS 0.1
+
+__attribute__ ((noinline, optimize (1)))
+void static
+compute_rsqrt_ref (float *a, float *r)
+{
+  for (int i = 0; i < MAX; i++)
+r[i] = 1.0 / sqrtf (a[i]);
+}
+
+__attribute__ ((noinline))
+void static
+compute_rsqrt_exp (float *a, float *r)
+{
+  for (int i = 0; i < MAX; i++)
+r[i] = 1.0 / sqrtf (a[i]);
+}
+
+void static
+avx512er_test (void)
+{
+  float in[MAX];
+  float ref[MAX];
+  float exp[MAX];
+
+  for (int i = 0; i < MAX; i++)
+in[i] = 8765.987 - 8.6756 * i;
+
+  compute_rsqrt_ref (in, ref);
+  compute_rsqrt_exp (in, exp);
+
+  for (int i = 0; i < MAX; i++)
+{
+  float rel_err = (ref[i] - exp[i]) / ref[i];
+  rel_err = rel_err > 0.0 ? rel_err : -rel_err;
+  if (rel_err > EPS)
+   abort ();
+}
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512er-vrsqrt28ps-4.c 
b/gcc/testsuite/gcc.target/i386/avx512er-vrsqrt28ps-4.c
new file mode 100644
index 000..2f5f73f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512er-vrsqrt28ps-4.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -ftree-vectorize -mavx512er" } */
+
+#include "avx512er-vrsqrt28ps-3.c"
+
+/* { dg-final { scan-assembler-times "vrsqrt28ps\[^\n\r\]*zmm\[0-9\]+(?:\n|\[ 
\\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-not "vrcp28ps\[^\n\r\]*zmm\[0-9\]+(?:\n|\[ 
\\t\]+#)" } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512er-vrsqrt28ps-5.c 
b/gcc/testsuite/gcc.target/i386/avx512er-vrsqrt28ps-5.c
new file mode 100644
index 000..e067a81
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512er-vrsqrt28ps-5.c
@@ -0,0 +1,47 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx512er } */
+/* { dg-options "-O2 -ffast-math -ftree-vectorize -mavx512er" } */
+
+#include 
+#include "avx512er-check.h"
+
+#define MAX 1000
+#define EPS 0.1
+
+__attribute__ ((noinline, optimize (1)))
+void static
+compute_sqrt_ref (float *a, float *r)
+{
+  for (int i = 0; i < MAX; i++)
+r[i] = sqrtf (a[i]);
+}
+
+__attribute__ ((noinline))
+void static
+compute_sqrt_exp (float *a, float *r)
+{
+  for (int i = 0; i < MAX; i++)
+r[i] = sqrtf (a[i]);
+}
+
+void static
+avx512er_test (void)
+{
+  float in[MAX];
+  float ref[MAX];

[PATCH, i386, AVX-512ER] vrcp28ps auto generation

2016-06-20 Thread Ilya Verbin
Hi!

This patch emits vrcp28ps and vmulps istructions for ix86_emit_swdivsf.
The relative error is < 2^-23, so no additional iteration is necessary.
Regtested using various benchmarks on a AVX-512ER machine.  OK for trunk?


gcc/
* config/i386/i386.c (ix86_emit_swdivsf): Emit vrcp28ps.
gcc/testsuite/
* gcc.target/i386/avx512er-vrcp28ps-3.c: New test.
* gcc.target/i386/avx512er-vrcp28ps-4.c: New test.


diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 56a5b9c..8e0bf26 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -48674,8 +48674,19 @@ void ix86_emit_swdivsf (rtx res, rtx a, rtx b, 
machine_mode mode)
 
   /* x0 = rcp(b) estimate */
   if (mode == V16SFmode || mode == V8DFmode)
-emit_insn (gen_rtx_SET (x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, b),
-   UNSPEC_RCP14)));
+{
+  if (TARGET_AVX512ER)
+   {
+ emit_insn (gen_rtx_SET (x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, b),
+ UNSPEC_RCP28)));
+ /* res = a * x0 */
+ emit_insn (gen_rtx_SET (res, gen_rtx_MULT (mode, a, x0)));
+ return;
+   }
+  else
+   emit_insn (gen_rtx_SET (x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, b),
+   UNSPEC_RCP14)));
+}
   else
 emit_insn (gen_rtx_SET (x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, b),
UNSPEC_RCP)));
diff --git a/gcc/testsuite/gcc.target/i386/avx512er-vrcp28ps-3.c 
b/gcc/testsuite/gcc.target/i386/avx512er-vrcp28ps-3.c
new file mode 100644
index 000..e08bea4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512er-vrcp28ps-3.c
@@ -0,0 +1,50 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx512er } */
+/* { dg-options "-O2 -ffast-math -ftree-vectorize -mavx512er" } */
+
+#include "avx512er-check.h"
+
+#define MAX 1000
+#define EPS 0.1
+
+__attribute__ ((noinline, optimize (0)))
+void static
+compute_rcp_ref (float *a, float *b, float *r)
+{
+  for (int i = 0; i < MAX; i++)
+r[i] = a[i] / b[i];
+}
+
+__attribute__ ((noinline))
+void static
+compute_rcp_exp (float *a, float *b, float *r)
+{
+  for (int i = 0; i < MAX; i++)
+r[i] = a[i] / b[i];
+}
+
+void static
+avx512er_test (void)
+{
+  float a[MAX];
+  float b[MAX];
+  float ref[MAX];
+  float exp[MAX];
+
+  for (int i = 0; i < MAX; i++)
+{
+  a[i] = 179.345 - 6.5645 * i;
+  b[i] = 8765.987 - 8.6756 * i;
+}
+
+  compute_rcp_ref (a, b, ref);
+  compute_rcp_exp (a, b, exp);
+
+  for (int i = 0; i < MAX; i++)
+{
+  float rel_err = (ref[i] - exp[i]) / ref[i];
+  rel_err = rel_err > 0.0 ? rel_err : -rel_err;
+  if (rel_err > EPS)
+   abort ();
+}
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512er-vrcp28ps-4.c 
b/gcc/testsuite/gcc.target/i386/avx512er-vrcp28ps-4.c
new file mode 100644
index 000..2c76d96
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512er-vrcp28ps-4.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -ftree-vectorize -mavx512er" } */
+
+#include "avx512er-vrcp28ps-3.c"
+
+/* { dg-final { scan-assembler-times "vrcp28ps\[^\n\r\]*zmm\[0-9\]+(?:\n|\[ 
\\t\]+#)" 1 } } */


  -- Ilya


Re: Cilk Plus testsuite needs massive cleanup (PR testsuite/70595)

2016-06-14 Thread Ilya Verbin
On Fri, Apr 29, 2016 at 11:19:47 -0700, Mike Stump wrote:
> On Apr 29, 2016, at 5:41 AM, Rainer Orth  
> wrote:
> > diff --git a/gcc/config/darwin.h b/gcc/config/darwin.h
> > --- a/gcc/config/darwin.h
> > +++ b/gcc/config/darwin.h
> > @@ -179,6 +179,7 @@ extern GTY(()) int darwin_ms_struct;
> >%{L*} %(link_libgcc) %o 
> > %{fprofile-arcs|fprofile-generate*|coverage:-lgcov} \
> >%{fopenacc|fopenmp|%:gt(%{ftree-parallelize-loops=*:%*} 1): \
> >  %{static|static-libgcc|static-libstdc++|static-libgfortran: 
> > libgomp.a%s; : -lgomp } } \
> > +%{fcilkplus:%:include(libcilkrts.spec)%(link_cilkrts)}\
> >%{fgnu-tm: \
> >  %{static|static-libgcc|static-libstdc++|static-libgfortran: 
> > libitm.a%s; : -litm } } \
> >%{!nostdlib:%{!nodefaultlibs:\
> 
> Ok.

Is it OK to backport this patch to gcc-6-branch?
I've re-tested it on macOS with gcc 6.

  -- Ilya


Re: [PATCH, i386, AVX-512] Add vectorizer support builtins

2016-06-02 Thread Ilya Verbin
On Mon, May 23, 2016 at 19:11:53 +0300, Ilya Verbin wrote:
> This patch adds missed 512-bit rounding builtins for vectorization.
> Regtested on x86_64-linux and i686-linux.  OK for trunk?
> 
> gcc/
>   * config/i386/i386-builtin-types.def: Add V16SI_FTYPE_V16SF,
>   V8DF_FTYPE_V8DF_ROUND, V16SF_FTYPE_V16SF_ROUND, V16SI_FTYPE_V16SF_ROUND.
>   * config/i386/i386.c (enum ix86_builtins): Add
>   IX86_BUILTIN_CVTPS2DQ512_MASK, IX86_BUILTIN_FLOORPS512,
>   IX86_BUILTIN_FLOORPD512, IX86_BUILTIN_CEILPS512, IX86_BUILTIN_CEILPD512,
>   IX86_BUILTIN_TRUNCPS512, IX86_BUILTIN_TRUNCPD512,
>   IX86_BUILTIN_CVTPS2DQ512, IX86_BUILTIN_VEC_PACK_SFIX512,
>   IX86_BUILTIN_FLOORPS_SFIX512, IX86_BUILTIN_CEILPS_SFIX512,
>   IX86_BUILTIN_ROUNDPS_AZ_SFIX512.
>   (builtin_description bdesc_args): Add __builtin_ia32_floorps512,
>   __builtin_ia32_ceilps512, __builtin_ia32_truncps512,
>   __builtin_ia32_floorpd512, __builtin_ia32_ceilpd512,
>   __builtin_ia32_truncpd512, __builtin_ia32_cvtps2dq512,
>   __builtin_ia32_vec_pack_sfix512, __builtin_ia32_roundps_az_sfix512,
>   __builtin_ia32_floorps_sfix512, __builtin_ia32_ceilps_sfix512.
>   Change IX86_BUILTIN_CVTPS2DQ512 to IX86_BUILTIN_CVTPS2DQ512_MASK for
>   __builtin_ia32_cvtps2dq512_mask.
>   (ix86_expand_args_builtin): Handle V8DF_FTYPE_V8DF_ROUND,
>   V16SF_FTYPE_V16SF_ROUND, V16SI_FTYPE_V16SF_ROUND, V16SI_FTYPE_V16SF.
>   (ix86_builtin_vectorized_function): Handle builtins mentioned above.
>   * config/i386/sse.md
>   (avx512f_fix_notruncv16sfv16si):
>   Rename to ...
>   (avx512f_fix_notruncv16sfv16si): ... this.
>   (avx512f_cvtpd2dq512): Rename
>   to ...
>   (avx512f_cvtpd2dq512): ... this.
>   (avx512f_vec_pack_sfix_v8df): New define_expand.
>   (avx512f_roundpd512): Rename to ...
>   (avx512f_round512): ... this.  Change iterator.
>   (avx512f_roundps512_sfix): New define_expand.
>   (round2_sfix): Change iterator.
> gcc/testsuite/
>   * gcc.target/i386/avx512f-ceil-vec-1.c: New test.
>   * gcc.target/i386/avx512f-ceil-vec-2.c: New test.
>   * gcc.target/i386/avx512f-ceilf-sfix-vec-1.c: New test.
>   * gcc.target/i386/avx512f-ceilf-sfix-vec-2.c: New test.
>   * gcc.target/i386/avx512f-ceilf-vec-1.c: New test.
>   * gcc.target/i386/avx512f-ceilf-vec-2.c: New test.
>   * gcc.target/i386/avx512f-floor-vec-1.c: New test.
>   * gcc.target/i386/avx512f-floor-vec-2.c: New test.
>   * gcc.target/i386/avx512f-floorf-sfix-vec-1.c: New test.
>   * gcc.target/i386/avx512f-floorf-sfix-vec-2.c: New test.
>   * gcc.target/i386/avx512f-floorf-vec-1.c: New test.
>   * gcc.target/i386/avx512f-floorf-vec-2.c: New test.
>   * gcc.target/i386/avx512f-rint-sfix-vec-1.c: New test.
>   * gcc.target/i386/avx512f-rint-sfix-vec-2.c: New test.
>   * gcc.target/i386/avx512f-rintf-sfix-vec-1.c: New test.
>   * gcc.target/i386/avx512f-rintf-sfix-vec-2.c: New test.
>   * gcc.target/i386/avx512f-round-sfix-vec-1.c: New test.
>   * gcc.target/i386/avx512f-round-sfix-vec-2.c: New test.
>   * gcc.target/i386/avx512f-roundf-sfix-vec-1.c: New test.
>   * gcc.target/i386/avx512f-roundf-sfix-vec-2.c: New test.
>   * gcc.target/i386/avx512f-trunc-vec-1.c: New test.
>   * gcc.target/i386/avx512f-trunc-vec-2.c: New test.
>   * gcc.target/i386/avx512f-truncf-vec-1.c: New test.
>   * gcc.target/i386/avx512f-truncf-vec-2.c: New test.

Is it OK for gcc-6-branch?

  -- Ilya


[PATCH, i386, AVX-512] Add vectorizer support builtins

2016-05-23 Thread Ilya Verbin
Hi!

This patch adds missed 512-bit rounding builtins for vectorization.
Regtested on x86_64-linux and i686-linux.  OK for trunk?


gcc/
* config/i386/i386-builtin-types.def: Add V16SI_FTYPE_V16SF,
V8DF_FTYPE_V8DF_ROUND, V16SF_FTYPE_V16SF_ROUND, V16SI_FTYPE_V16SF_ROUND.
* config/i386/i386.c (enum ix86_builtins): Add
IX86_BUILTIN_CVTPS2DQ512_MASK, IX86_BUILTIN_FLOORPS512,
IX86_BUILTIN_FLOORPD512, IX86_BUILTIN_CEILPS512, IX86_BUILTIN_CEILPD512,
IX86_BUILTIN_TRUNCPS512, IX86_BUILTIN_TRUNCPD512,
IX86_BUILTIN_CVTPS2DQ512, IX86_BUILTIN_VEC_PACK_SFIX512,
IX86_BUILTIN_FLOORPS_SFIX512, IX86_BUILTIN_CEILPS_SFIX512,
IX86_BUILTIN_ROUNDPS_AZ_SFIX512.
(builtin_description bdesc_args): Add __builtin_ia32_floorps512,
__builtin_ia32_ceilps512, __builtin_ia32_truncps512,
__builtin_ia32_floorpd512, __builtin_ia32_ceilpd512,
__builtin_ia32_truncpd512, __builtin_ia32_cvtps2dq512,
__builtin_ia32_vec_pack_sfix512, __builtin_ia32_roundps_az_sfix512,
__builtin_ia32_floorps_sfix512, __builtin_ia32_ceilps_sfix512.
Change IX86_BUILTIN_CVTPS2DQ512 to IX86_BUILTIN_CVTPS2DQ512_MASK for
__builtin_ia32_cvtps2dq512_mask.
(ix86_expand_args_builtin): Handle V8DF_FTYPE_V8DF_ROUND,
V16SF_FTYPE_V16SF_ROUND, V16SI_FTYPE_V16SF_ROUND, V16SI_FTYPE_V16SF.
(ix86_builtin_vectorized_function): Handle builtins mentioned above.
* config/i386/sse.md
(avx512f_fix_notruncv16sfv16si):
Rename to ...
(avx512f_fix_notruncv16sfv16si): ... this.
(avx512f_cvtpd2dq512): Rename
to ...
(avx512f_cvtpd2dq512): ... this.
(avx512f_vec_pack_sfix_v8df): New define_expand.
(avx512f_roundpd512): Rename to ...
(avx512f_round512): ... this.  Change iterator.
(avx512f_roundps512_sfix): New define_expand.
(round2_sfix): Change iterator.
gcc/testsuite/
* gcc.target/i386/avx512f-ceil-vec-1.c: New test.
* gcc.target/i386/avx512f-ceil-vec-2.c: New test.
* gcc.target/i386/avx512f-ceilf-sfix-vec-1.c: New test.
* gcc.target/i386/avx512f-ceilf-sfix-vec-2.c: New test.
* gcc.target/i386/avx512f-ceilf-vec-1.c: New test.
* gcc.target/i386/avx512f-ceilf-vec-2.c: New test.
* gcc.target/i386/avx512f-floor-vec-1.c: New test.
* gcc.target/i386/avx512f-floor-vec-2.c: New test.
* gcc.target/i386/avx512f-floorf-sfix-vec-1.c: New test.
* gcc.target/i386/avx512f-floorf-sfix-vec-2.c: New test.
* gcc.target/i386/avx512f-floorf-vec-1.c: New test.
* gcc.target/i386/avx512f-floorf-vec-2.c: New test.
* gcc.target/i386/avx512f-rint-sfix-vec-1.c: New test.
* gcc.target/i386/avx512f-rint-sfix-vec-2.c: New test.
* gcc.target/i386/avx512f-rintf-sfix-vec-1.c: New test.
* gcc.target/i386/avx512f-rintf-sfix-vec-2.c: New test.
* gcc.target/i386/avx512f-round-sfix-vec-1.c: New test.
* gcc.target/i386/avx512f-round-sfix-vec-2.c: New test.
* gcc.target/i386/avx512f-roundf-sfix-vec-1.c: New test.
* gcc.target/i386/avx512f-roundf-sfix-vec-2.c: New test.
* gcc.target/i386/avx512f-trunc-vec-1.c: New test.
* gcc.target/i386/avx512f-trunc-vec-2.c: New test.
* gcc.target/i386/avx512f-truncf-vec-1.c: New test.
* gcc.target/i386/avx512f-truncf-vec-2.c: New test.


diff --git a/gcc/config/i386/i386-builtin-types.def 
b/gcc/config/i386/i386-builtin-types.def
index 75d57d9..c66f651 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -294,6 +294,7 @@ DEF_FUNCTION_TYPE (V8DF, V4DF)
 DEF_FUNCTION_TYPE (V8DF, V2DF)
 DEF_FUNCTION_TYPE (V16SI, V4SI)
 DEF_FUNCTION_TYPE (V16SI, V8SI)
+DEF_FUNCTION_TYPE (V16SI, V16SF)
 DEF_FUNCTION_TYPE (V16SI, V16SI, V16SI, UHI)
 DEF_FUNCTION_TYPE (V8DI, V8DI, V8DI, UQI)
 DEF_FUNCTION_TYPE (V8DI, PV8DI)
@@ -1061,14 +1062,17 @@ DEF_FUNCTION_TYPE (VOID, QI, V8DI, PCINT, INT, INT)
 
 DEF_FUNCTION_TYPE_ALIAS (V2DF_FTYPE_V2DF, ROUND)
 DEF_FUNCTION_TYPE_ALIAS (V4DF_FTYPE_V4DF, ROUND)
+DEF_FUNCTION_TYPE_ALIAS (V8DF_FTYPE_V8DF, ROUND)
 DEF_FUNCTION_TYPE_ALIAS (V4SF_FTYPE_V4SF, ROUND)
 DEF_FUNCTION_TYPE_ALIAS (V8SF_FTYPE_V8SF, ROUND)
+DEF_FUNCTION_TYPE_ALIAS (V16SF_FTYPE_V16SF, ROUND)
 
 DEF_FUNCTION_TYPE_ALIAS (V4SI_FTYPE_V2DF_V2DF, ROUND)
 DEF_FUNCTION_TYPE_ALIAS (V8SI_FTYPE_V4DF_V4DF, ROUND)
 DEF_FUNCTION_TYPE_ALIAS (V16SI_FTYPE_V8DF_V8DF, ROUND)
 DEF_FUNCTION_TYPE_ALIAS (V4SI_FTYPE_V4SF, ROUND)
 DEF_FUNCTION_TYPE_ALIAS (V8SI_FTYPE_V8SF, ROUND)
+DEF_FUNCTION_TYPE_ALIAS (V16SI_FTYPE_V16SF, ROUND)
 
 DEF_FUNCTION_TYPE_ALIAS (INT_FTYPE_V2DF_V2DF, PTEST)
 DEF_FUNCTION_TYPE_ALIAS (INT_FTYPE_V2DI_V2DI, PTEST)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 1cb88d6..049a006 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -30935,7 +30935,7 @@ enum ix86_builtins
   IX86_BUILTIN_CVTPD2PS512,
   

Re: [PATCH][Testsuite] Force testing of vectorized builtins rather than inlined i387 asm

2016-05-23 Thread Ilya Verbin
On Sat, May 21, 2016 at 09:51:36 +0200, Uros Bizjak wrote:
> On Fri, May 20, 2016 at 8:01 PM, Ilya Verbin <iver...@gmail.com> wrote:
> > In some cases the i387 version of a math function may be inlined from 
> > math.h,
> > and the testcase (like gcc.target/i386/sse4_1-ceil-vec.c) will actually test
> > inlined asm instead of vectorized builtin.  To fix this I've created a new 
> > file
> > gcc.dg/mathfunc.h (similar to gcc.dg/strlenopt.h) and changed vectorization
> > tests so that they include it instead of math.h.
> > Regtested on x86_64-linux and i686-linux.  Is it OK for trunk?
> 
> No, please just #define NO_MATH_INLINES before math.h is included.
> This will solve unwanted inlining.

Thanks for the hint.  I'll check-in this patch tomorrow.


gcc/testsuite/
* gcc.target/i386/avx-ceil-sfix-2-vec.c: Define __NO_MATH_INLINES before
math.h is included.
* gcc.target/i386/avx-floor-sfix-2-vec.c: Likewise.
* gcc.target/i386/avx-rint-sfix-2-vec.c: Likewise.
* gcc.target/i386/avx-round-sfix-2-vec.c: Likewise.
* gcc.target/i386/avx512f-ceil-sfix-vec-1.c: Likewise.
* gcc.target/i386/avx512f-floor-sfix-vec-1.c: Likewise.
* gcc.target/i386/sse4_1-ceil-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-ceil-vec.c: Likewise.
* gcc.target/i386/sse4_1-ceilf-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-ceilf-vec.c: Likewise.
* gcc.target/i386/sse4_1-floor-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-floor-vec.c: Likewise.
* gcc.target/i386/sse4_1-rint-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-rint-vec.c: Likewise.
* gcc.target/i386/sse4_1-rintf-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-rintf-vec.c: Likewise.
* gcc.target/i386/sse4_1-round-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-round-vec.c: Likewise.
* gcc.target/i386/sse4_1-roundf-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-roundf-vec.c: Likewise.
* gcc.target/i386/sse4_1-trunc-vec.c: Likewise.
* gcc.target/i386/sse4_1-truncf-vec.c: Likewise.
* gcc.target/i386/sse4_1-floorf-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-floorf-vec.c: Likewise.


diff --git a/gcc/testsuite/gcc.target/i386/avx-ceil-sfix-2-vec.c 
b/gcc/testsuite/gcc.target/i386/avx-ceil-sfix-2-vec.c
index bf48b80..45b7af7 100644
--- a/gcc/testsuite/gcc.target/i386/avx-ceil-sfix-2-vec.c
+++ b/gcc/testsuite/gcc.target/i386/avx-ceil-sfix-2-vec.c
@@ -13,6 +13,7 @@
 
 #include CHECK_H
 
+#define __NO_MATH_INLINES
 #include 
 
 extern double ceil (double);
diff --git a/gcc/testsuite/gcc.target/i386/avx-floor-sfix-2-vec.c 
b/gcc/testsuite/gcc.target/i386/avx-floor-sfix-2-vec.c
index 275199c..0a28c76 100644
--- a/gcc/testsuite/gcc.target/i386/avx-floor-sfix-2-vec.c
+++ b/gcc/testsuite/gcc.target/i386/avx-floor-sfix-2-vec.c
@@ -13,6 +13,7 @@
 
 #include CHECK_H
 
+#define __NO_MATH_INLINES
 #include 
 
 extern double floor (double);
diff --git a/gcc/testsuite/gcc.target/i386/avx-rint-sfix-2-vec.c 
b/gcc/testsuite/gcc.target/i386/avx-rint-sfix-2-vec.c
index 9f273af..e6c47b8 100644
--- a/gcc/testsuite/gcc.target/i386/avx-rint-sfix-2-vec.c
+++ b/gcc/testsuite/gcc.target/i386/avx-rint-sfix-2-vec.c
@@ -13,6 +13,7 @@
 
 #include CHECK_H
 
+#define __NO_MATH_INLINES
 #include 
 
 extern double rint (double);
diff --git a/gcc/testsuite/gcc.target/i386/avx-round-sfix-2-vec.c 
b/gcc/testsuite/gcc.target/i386/avx-round-sfix-2-vec.c
index ddb46d9..dc0a7db 100644
--- a/gcc/testsuite/gcc.target/i386/avx-round-sfix-2-vec.c
+++ b/gcc/testsuite/gcc.target/i386/avx-round-sfix-2-vec.c
@@ -13,6 +13,7 @@
 
 #include CHECK_H
 
+#define __NO_MATH_INLINES
 #include 
 
 extern double round (double);
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-ceil-sfix-vec-1.c 
b/gcc/testsuite/gcc.target/i386/avx512f-ceil-sfix-vec-1.c
index 038d25e..d7d6916 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-ceil-sfix-vec-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-ceil-sfix-vec-1.c
@@ -3,6 +3,7 @@
 /* { dg-require-effective-target avx512f } */
 /* { dg-skip-if "no M_PI" { vxworks_kernel } } */
 
+#define __NO_MATH_INLINES
 #include 
 #include "avx512f-check.h"
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-floor-sfix-vec-1.c 
b/gcc/testsuite/gcc.target/i386/avx512f-floor-sfix-vec-1.c
index fab7e65..b46ea9f 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-floor-sfix-vec-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-floor-sfix-vec-1.c
@@ -3,6 +3,7 @@
 /* { dg-require-effective-target avx512f } */
 /* { dg-skip-if "no M_PI" { vxworks_kernel } } */
 
+#define __NO_MATH_INLINES
 #include 
 #include "avx512f-check.h"
 
diff --git a/gcc/testsuite/gcc.target/i386/sse4_1-ceil-sfix-vec.c 
b/gcc/testsuite/gcc.target/i386/sse4_1-ceil-sfix-vec.c
index ca07d9c..bb32c8d 100644
--- a/gcc/testsuite/gcc.target/i386/sse4_1-ceil-s

[PATCH][Testsuite] Force testing of vectorized builtins rather than inlined i387 asm

2016-05-20 Thread Ilya Verbin
Hi!

In some cases the i387 version of a math function may be inlined from math.h,
and the testcase (like gcc.target/i386/sse4_1-ceil-vec.c) will actually test
inlined asm instead of vectorized builtin.  To fix this I've created a new file
gcc.dg/mathfunc.h (similar to gcc.dg/strlenopt.h) and changed vectorization
tests so that they include it instead of math.h.
Regtested on x86_64-linux and i686-linux.  Is it OK for trunk?

gcc/testsuite/
* gcc.dg/mathfunc.h: New file.
* gcc.target/i386/avx-ceil-sfix-2-vec.c: Do not skip if there is no M_PI
for vxworks_kernel.  Include mathfunc.h instead of math.h.  Remove
declaration.
* gcc.target/i386/avx-cvt-2-vec.c: Likewise.
* gcc.target/i386/avx-floor-sfix-2-vec.c: Likewise.
* gcc.target/i386/avx-rint-sfix-2-vec.c: Likewise.
* gcc.target/i386/avx-round-sfix-2-vec.c: Likewise.
* gcc.target/i386/avx512f-ceil-sfix-vec-1.c: Likewise.
* gcc.target/i386/avx512f-floor-sfix-vec-1.c: Likewise.
* gcc.target/i386/sse2-cvt-vec.c: Likewise.
* gcc.target/i386/sse4_1-ceil-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-ceil-vec.c: Likewise.
* gcc.target/i386/sse4_1-ceilf-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-ceilf-vec.c: Likewise.
* gcc.target/i386/sse4_1-floor-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-floor-vec.c: Likewise.
* gcc.target/i386/sse4_1-rint-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-rint-vec.c: Likewise.
* gcc.target/i386/sse4_1-rintf-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-rintf-vec.c: Likewise.
* gcc.target/i386/sse4_1-round-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-round-vec.c: Likewise.
* gcc.target/i386/sse4_1-roundf-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-roundf-vec.c: Likewise.
* gcc.target/i386/sse4_1-trunc-vec.c: Likewise.
* gcc.target/i386/sse4_1-truncf-vec.c: Likewise.
* gcc.target/i386/sse4_1-floorf-sfix-vec.c: Likewise.  Use floorf
instead of __builtin_floorf.
* gcc.target/i386/sse4_1-floorf-vec.c: Likewise.


diff --git a/gcc/testsuite/gcc.dg/mathfunc.h b/gcc/testsuite/gcc.dg/mathfunc.h
new file mode 100644
index 000..1c1b7bc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mathfunc.h
@@ -0,0 +1,20 @@
+/* This is a replacement of needed parts from math.h for testing vectorization,
+   to ensure we are testing the builtins rather than whatever the OS has in its
+   headers.  */
+
+#define M_PI  3.14159265358979323846
+
+extern double ceil (double);
+extern float ceilf (float);
+
+extern double floor (double);
+extern float floorf (float);
+
+extern double trunc (double);
+extern float truncf (float);
+
+extern double round (double);
+extern float roundf (float);
+
+extern double rint (double);
+extern float rintf (float);
diff --git a/gcc/testsuite/gcc.target/i386/avx-ceil-sfix-2-vec.c 
b/gcc/testsuite/gcc.target/i386/avx-ceil-sfix-2-vec.c
index bf48b80..567a16d 100644
--- a/gcc/testsuite/gcc.target/i386/avx-ceil-sfix-2-vec.c
+++ b/gcc/testsuite/gcc.target/i386/avx-ceil-sfix-2-vec.c
@@ -1,7 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O2 -ffast-math -ftree-vectorize -mavx" } */
 /* { dg-require-effective-target avx } */
-/* { dg-skip-if "no M_PI" { vxworks_kernel } } */
 
 #ifndef CHECK_H
 #define CHECK_H "avx-check.h"
@@ -13,9 +12,7 @@
 
 #include CHECK_H
 
-#include 
-
-extern double ceil (double);
+#include "../../gcc.dg/mathfunc.h"
 
 #define NUM 4
 
diff --git a/gcc/testsuite/gcc.target/i386/avx-cvt-2-vec.c 
b/gcc/testsuite/gcc.target/i386/avx-cvt-2-vec.c
index 0081dcf..8a8369b 100644
--- a/gcc/testsuite/gcc.target/i386/avx-cvt-2-vec.c
+++ b/gcc/testsuite/gcc.target/i386/avx-cvt-2-vec.c
@@ -1,7 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O2 -ffast-math -ftree-vectorize -mavx" } */
 /* { dg-require-effective-target avx } */
-/* { dg-skip-if "no M_PI" { vxworks_kernel } } */
 
 #ifndef CHECK_H
 #define CHECK_H "avx-check.h"
@@ -13,7 +12,7 @@
 
 #include CHECK_H
 
-#include 
+#include "../../gcc.dg/mathfunc.h"
 
 #define NUM 4
 
diff --git a/gcc/testsuite/gcc.target/i386/avx-floor-sfix-2-vec.c 
b/gcc/testsuite/gcc.target/i386/avx-floor-sfix-2-vec.c
index 275199c..44002b4 100644
--- a/gcc/testsuite/gcc.target/i386/avx-floor-sfix-2-vec.c
+++ b/gcc/testsuite/gcc.target/i386/avx-floor-sfix-2-vec.c
@@ -1,7 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O2 -ffast-math -ftree-vectorize -mavx" } */
 /* { dg-require-effective-target avx } */
-/* { dg-skip-if "no M_PI" { vxworks_kernel } } */
 
 #ifndef CHECK_H
 #define CHECK_H "avx-check.h"
@@ -13,9 +12,7 @@
 
 #include CHECK_H
 
-#include 
-
-extern double floor (double);
+#include "../../gcc.dg/mathfunc.h"
 
 #define NUM 4
 
diff --git a/gcc/testsuite/gcc.target/i386/avx-rint-sfix-2-vec.c 
b/gcc/testsuite/gcc.target/i386/avx-rint-sfix-2-vec.c
index 9f273af..980b341 100644
--- a/gcc/testsuite/gcc.target/i386/avx-rint-sfix-2-vec.c
+++ 

Re: [PATCH][CilkPlus] Merge libcilkrts from upstream

2016-05-11 Thread Ilya Verbin
On Wed, May 11, 2016 at 10:47:49 +0100, Ramana Radhakrishnan wrote:
> 
> > I've looked at the generated code in more details, and for armv6 this 
> > generates
> > mcr p15, 0, r0, c7, c10, 5
> > which is not what __cilkrts_fence uses currently (CP15DSB vs CP15DMB)
> 
> Wow I hadn't noticed that it was a DSB -  DSB is way too heavy weight. 
> Userland shouldn't need to use this by default IMNSHO. It's needed if you are 
> working on non-cacheable memory or performing cache maintenance operations 
> but I can't imagine cilkplus wanting to do that ! 
> 
> http://infocenter.arm.com/help/topic/com.arm.doc.genc007826/Barrier_Litmus_Tests_and_Cookbook_A08.pdf
> 
> It's almost like the default definitions need to be in terms of the atomic 
> extensions rather than having these written in this form. Folks usually get 
> this wrong ! 
> 
> > Looking at arm/sync.md it seems that there is no way to generate CP15DSB.
> 
> No - there is no way of generating DSB,  DMB's should be sufficient for this 
> purpose. Would anyone know what the semantics of __cilkrts_fence are that 
> require this to be a DSB ? 

__cilkrts_fence semantics is identical to __sync_synchronize, so DMB look OK.

Maybe we should just define:
  #define __cilkrts_fence() __sync_synchronize()
?

  -- Ilya

> Ramana
> 
> > 
> >> Christophe
> >>
> >>> Thanks,
> >>>   -- Ilya


Re: [PATCH][CilkPlus] Allow parenthesized initialization in for-loops

2016-05-10 Thread Ilya Verbin
On Fri, Mar 25, 2016 at 18:23:23 +0300, Ilya Verbin wrote:
> On Mon, Mar 21, 2016 at 15:58:18 +0100, Jakub Jelinek wrote:
> > On Mon, Mar 21, 2016 at 05:45:52PM +0300, Ilya Verbin wrote:
> > > www.cilkplus.org/sites/default/files/open_specifications/Intel_Cilk_plus_lang_spec_1.2.htm
> > > says:
> > >   In C++, the control variable shall be declared and initialized within 
> > > the
> > >   initialization clause of the _Cilk_for loop. The variable shall have 
> > > automatic
> > >   storage duration. The variable shall be initialized. Initialization may 
> > > be
> > >   explicit, using assignment or constructor syntax, or implicit via a 
> > > nontrivial
> > >   default constructor.
> > > 
> > > This patch enables constructor-syntax initialization.
> > > Bootstraped and regtested on x86_64-linux.  OK for stage1?
> > 
> > Does this affect just _Cilk_for or also #pragma simd?
> 
> It affects both.
> 
> > What about (some_class i { 0 }; some_class < ...; some_class++)
> > and similar syntax?
> 
> It's allowed, thanks, I missed this in the initial patch.
> 
> > The testsuite coverage is insufficient (nothing e.g.
> > tests templates or #pragma simd).
> 
> Patch is updated.  Is it sufficient now?
> 
> 
> gcc/cp/
>   * parser.c (cp_parser_omp_for_loop_init): Allow constructor syntax in
>   Cilk Plus for-loop initialization.
> gcc/testsuite/
>   * g++.dg/cilk-plus/CK/for2.cc: New test.
>   * g++.dg/cilk-plus/for5.C: New test.

Ping.

  -- Ilya


Re: [PATCH][CilkPlus] Merge libcilkrts from upstream

2016-05-10 Thread Ilya Verbin
On Tue, May 10, 2016 at 14:36:36 +0100, Ramana Radhakrishnan wrote:
> On Tue, May 10, 2016 at 2:02 PM, Christophe Lyon
>  wrote:
> > On 9 May 2016 at 15:34, Christophe Lyon  wrote:
> >> On 9 May 2016 at 15:29, Jeff Law  wrote:
> >>> On 05/09/2016 01:37 AM, Christophe Lyon wrote:
>  After this merge, I'm seeing lots of timeouts on arm (using QEMU).
>  Is this "expected"? (as in: should I increase my timeout value)
> >>>
> >>> I wouldn't say it's expected; this is the first time Cilk+ has been
> >>> supported on ARM.  It could be a bug in the ARM support in the runtime, an
> >>> ARM compiler bug or even a bug in the ARM QEMU support.
> >>>
> >>> Probably the first step is to see if it's working properly on real 
> >>> hardware.
> >>> That would at least allow us to eliminate QEMU from the equation if it's
> >>> failing in the same manner on a real machine.
> >>>
> >> OK, I'll check that.
> >> I wanted to know if I was missing something obvious.
> >
> > I've tested in an armhf chroot on an armv8 machine, and I saw SIGILL errors
> > on:
> > mcr 15, 0, r3, cr7, cr10, {4}
> > which is how __cilkrts_fence is implemented in
> > ../libcilkrts/runtime/config/arm/os-fence.h
> 
> At first glance I'd ask why this shouldn't be __atomic_thread_fence or
> __atomic_signal_fence ( SEQ_CST)  if that's what they want here and
> then it will work (TM) regardless of architecture levels.
> 
> > This instruction is not supported anymore on armv8. Recent arm64 kernels
> > have handlers for it.
> >
> > So we may want the implementation to be conditional, or prefer to rely on
> > kernel support.

ARM enabling code was taken from community contribution, we haven't tested it.
If someone wants to fix this, it would be appreciated.

Thanks,
  -- Ilya


Re: [PATCH][CilkPlus] Merge libcilkrts from upstream

2016-05-10 Thread Ilya Verbin
On Mon, May 09, 2016 at 11:39:51 +0200, Matthias Klose wrote:
> >well, it breaks the build for many multilib configurations where multilib
> >binaries cannot be run on the current environment, e.g. building x32 
> >multilibs
> >on a kernel which doesn't have x32 enabled.
> >
> >The reason is again moving compiler checks (AC_USE_SYSTEM_EXTENSIONS) in the
> >configure.ac before the AM_ENABLE_MULTILIB.
> >
> >So please move the AC_USE_SYSTEM_EXTENSIONS macro behind the 
> >AM_ENABLE_MULTILIB,
> >and forward this change upstream if applicable.
> >
> >example build log at
> >https://buildd.debian.org/status/fetch.php?pkg=gcc-snapshot=amd64=20160506-1=1462580913
> 
> fixed by:
> 
> * configure.ac: Move AC_USE_SYSTEM_EXTENSIONS behind 
> AM_ENABLE_MULTILIB.
> * configure: Regenerate.
> 
> --- a/libcilkrts/configure.ac
> +++ b/libcilkrts/configure.ac
> @@ -51,9 +51,6 @@
>  target_alias=${target_alias-$host_alias}
>  AC_SUBST(target_alias)
> 
> -# Test for GNU extensions. Will define _GNU_SOURCE if they're available
> -AC_USE_SYSTEM_EXTENSIONS
> -
>  AM_INIT_AUTOMAKE(foreign no-dist)
> 
>  AM_MAINTAINER_MODE
> @@ -60,6 +57,9 @@
> 
>  AM_ENABLE_MULTILIB(, ..)
> 
> +# Test for GNU extensions. Will define _GNU_SOURCE if they're available
> +AC_USE_SYSTEM_EXTENSIONS
> +
>  # Build a DLL on Windows
>  # AC_LIBTOOL_WIN32_DLL
>  AC_PROG_CC

Thanks for investigating and fixing this.  The patch is pushed upstream.

  -- Ilya


Re: [PATCH] Apply fix for PR68463 to RS6000

2016-05-10 Thread Ilya Verbin
On Tue, May 10, 2016 at 11:48:53 -0400, David Edelsohn wrote:
> On Tue, May 10, 2016 at 11:39 AM, James Norris  
> wrote:
> > The fix for PR68463 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68463)
> > was missing code that prevented the fix from working on RS6000. The
> > attached patch adds the missing code for RS6000.
> >
> > Bootstrapped and regtested on Power8.
> >
> > OK for trunk?
> >
> > Thanks!
> > Jim
> >
> > =
> >
> > ChangeLog
> >
> > * config/rs6000/sysv4.h (CRTOFFLOADBEGIN): Define. Add
> > crtoffloadbegin.o
> > if offloading is enabled and -fopenacc or -fopenmp is specified.
> > (CRTOFFLOADEND): Likewise.
> > (STARTFILE_LINUX_SPEC): Add CRTOFFLOADBEGIN.
> > (ENDFILE_LINUX_SPEC): Add CRTOFFLOADEND.
> 
> Why is this enabled for openmp?  Not all openmp applications require 
> offloading.
> 
> I see that the same logic is used in config/gnu-user.h, but I'm
> curious about the need.

Yes, this adds a bit overhead to openmp applications without offloading (when
the compiler is configured with enabled offloading).  But there is no way to
determine from the driver whether the application uses offloading or not, only
underlying lto-wrapper can determine this by analyzing object files.

  -- Ilya


Re: Allow redefinition of libcilkrts debug macros

2016-04-29 Thread Ilya Verbin
Hi Rainer!

On Fri, Apr 29, 2016 at 10:58:25 +0200, Rainer Orth wrote:
> > On 04/26/2016 08:04 AM, Rainer Orth wrote:
> >> When working on a couple of Cilk Plus issues lately (PRs target/60290,
> >> target/68945), I noticed that you have to modify the libcilkplus sources
> >> to enable various debugging output.  This seems silly, and the following
> >> patch allows defining them from the command line.
> >>
> >> Tested on i386-pc-solaris2.12 and sparc-sun-solaris2.12.
> >>
> >> Ok for mainline?
> >>
> >>Rainer
> >>
> >>
> >> 2016-04-07  Rainer Orth  
> >>
> >>* runtime/except-gcc.cpp (DEBUG_EXCEPTIONS): Allow redefinition.
> >>* runtime/cilk_fiber.h (FIBER_DEBUG): Likewise.
> >>* runtime/scheduler.h (REDPAR_DEBUG): Likewise.
> > Ilya will have to chime in here -- we're a downstream consumer of the Cilk+
> > runtime.  So these patches need to go into Intel's tree first, then Ilya
> > can bring them into the GCC tree.
> 
> I suspected that much.  It would be good to have a libcilkrts/README.gcc
> describing the rules which changes can go into the gcc tree directly,
> which need to go upstream first, and how.  libo and libsanitizer already
> have this.

Could you please submit your patch to 
?
All patches for libcilkrts/* should go there first in order to avoid possible
license issues, or possible losses during the merge.

Thanks,
  -- Ilya


Re: [PATCH][CilkPlus] Fix PR69363

2016-04-20 Thread Ilya Verbin
On Wed, Feb 17, 2016 at 15:46:00 +0100, Jakub Jelinek wrote:
> On Wed, Feb 17, 2016 at 05:32:58PM +0300, Ilya Verbin wrote:
> > This patch fixes <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69363>
> > Bootstrap and make check passed.  OK for... stage 1?
> 
> Ok for stage1, with a few nits.

Committed to trunk with fixed line lengths.


gcc/c-family/
PR c++/69363
* c-cilkplus.c (c_finish_cilk_clauses): Remove function.
* c-common.h (c_finish_cilk_clauses): Remove declaration.
gcc/c/
PR c++/69363
* c-parser.c (c_parser_cilk_all_clauses): Use c_finish_omp_clauses
instead of c_finish_cilk_clauses.
* c-tree.h (c_finish_omp_clauses): Add new default argument.
* c-typeck.c (c_finish_omp_clauses): Add new argument.  Allow
floating-point variables in the linear clause for Cilk Plus.
gcc/cp/
PR c++/69363
* cp-tree.h (finish_omp_clauses): Add new default argument.
* parser.c (cp_parser_cilk_simd_all_clauses): Use finish_omp_clauses
instead of c_finish_cilk_clauses.
* semantics.c (finish_omp_clauses): Add new argument.  Allow
floating-point variables in the linear clause for Cilk Plus.
gcc/testsuite/
PR c++/69363
* c-c++-common/cilk-plus/PS/clauses3.c: Adjust dg-error string.
* c-c++-common/cilk-plus/PS/clauses4.c: New test.
* c-c++-common/cilk-plus/PS/pr69363.c: New test.


diff --git a/gcc/c-family/c-cilkplus.c b/gcc/c-family/c-cilkplus.c
index 3e7902fd..9f1f364 100644
--- a/gcc/c-family/c-cilkplus.c
+++ b/gcc/c-family/c-cilkplus.c
@@ -41,56 +41,6 @@ c_check_cilk_loop (location_t loc, tree decl)
   return true;
 }
 
-/* Validate and emit code for <#pragma simd> clauses.  */
-
-tree
-c_finish_cilk_clauses (tree clauses)
-{
-  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
-{
-  tree prev = clauses;
-
-  /* If a variable appears in a linear clause it cannot appear in
-any other OMP clause.  */
-  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_LINEAR)
-   for (tree c2 = clauses; c2; c2 = OMP_CLAUSE_CHAIN (c2))
- {
-   if (c == c2)
- continue;
-   enum omp_clause_code code = OMP_CLAUSE_CODE (c2);
-
-   switch (code)
- {
- case OMP_CLAUSE_LINEAR:
- case OMP_CLAUSE_PRIVATE:
- case OMP_CLAUSE_FIRSTPRIVATE:
- case OMP_CLAUSE_LASTPRIVATE:
- case OMP_CLAUSE_REDUCTION:
-   break;
-
- case OMP_CLAUSE_SAFELEN:
-   goto next;
-
- default:
-   gcc_unreachable ();
- }
-
-   if (OMP_CLAUSE_DECL (c) == OMP_CLAUSE_DECL (c2))
- {
-   error_at (OMP_CLAUSE_LOCATION (c2),
- "variable appears in more than one clause");
-   inform (OMP_CLAUSE_LOCATION (c),
-   "other clause defined here");
-   // Remove problematic clauses.
-   OMP_CLAUSE_CHAIN (prev) = OMP_CLAUSE_CHAIN (c2);
- }
- next:
-   prev = c2;
- }
-}
-  return clauses;
-}
-
 /* Calculate number of iterations of CILK_FOR.  */
 
 tree
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index fa3746c..663e457 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1369,7 +1369,6 @@ extern enum stv_conv scalar_to_vector (location_t loc, 
enum tree_code code,
   tree op0, tree op1, bool);
 
 /* In c-cilkplus.c  */
-extern tree c_finish_cilk_clauses (tree);
 extern tree c_validate_cilk_plus_loop (tree *, int *, void *);
 extern bool c_check_cilk_loop (location_t, tree);
 
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 1b6bacd..bdd669d 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -17509,7 +17509,7 @@ c_parser_cilk_all_clauses (c_parser *parser)
 
  saw_error:
   c_parser_skip_to_pragma_eol (parser);
-  return c_finish_cilk_clauses (clauses);
+  return c_finish_omp_clauses (clauses, false, false, true);
 }
 
 /* This function helps parse the grainsize pragma for a _Cilk_for statement.
diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
index d559207..4633182 100644
--- a/gcc/c/c-tree.h
+++ b/gcc/c/c-tree.h
@@ -661,7 +661,7 @@ extern tree c_begin_omp_task (void);
 extern tree c_finish_omp_task (location_t, tree, tree);
 extern void c_finish_omp_cancel (location_t, tree);
 extern void c_finish_omp_cancellation_point (location_t, tree);
-extern tree c_finish_omp_clauses (tree, bool, bool = false);
+extern tree c_finish_omp_clauses (tree, bool, bool = false, bool = false);
 extern tree c_build_va_arg (location_t, tree, location_t, tree);
 extern tree c_finish_transaction (location_t, tree, int);
 extern bool c_tree_equal (tree, tree);
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 59a3c61..58c2139 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@

Re: FW: [PATCH] Cilk Keywords (_Cilk_spawn and _Cilk_sync) for C

2016-03-29 Thread Ilya Verbin
On Tue, Mar 29, 2016 at 17:15:11 +0200, Thomas Schwinge wrote:
> On Mon, 28 Mar 2016 19:40:22 +0300, Ilya Verbin <iver...@gmail.com> wrote:
> > Do you plan to commit this patch? :)
> 
> Well, I'm also still waiting for you guys to merge (via the upstream
> Intel sources repository) my GNU Hurd portability patches; submitted to
> GCC in
> <http://news.gmane.org/find-root.php?message_id=%3C8738bae1mp.fsf%40kepler.schwinge.homeip.net%3E>
> and the following messages, dated 2014-09-26.  Upon request of Barry M
> Tannenbaum then submitted to the Intel web site, and then never heard of
> again...  ;-(

I'm going to merge libcilkrts from upstream at stage1.  Your patch is there:
https://bitbucket.org/intelcilkruntime/intel-cilk-runtime/commits/2b33a7bfcbcd1def8108287475755b68b4aef2f7

  -- Ilya


Re: FW: [PATCH] Cilk Keywords (_Cilk_spawn and _Cilk_sync) for C

2016-03-28 Thread Ilya Verbin
Hi Thomas!

Do you plan to commit this patch? :)

On Mon, Sep 29, 2014 at 09:24:40 -0600, Jeff Law wrote:
> On 09/29/14 08:26, Thomas Schwinge wrote:
> >On Mon, 29 Sep 2014 13:58:31 +, "Tannenbaum, Barry M" 
> > wrote:
> >>In a nutshell, add the following code to main() before the call to f3():
> >>
> >> int status = __cilkrts_set_param("nworkers", "2");
> >> if (0 != status) {
> >> // Failed to set the number of Cilk workers
> >> return status;
> >> }
> >
> >Yeah, that's what I had proposed with the patch at the end of my previous
> >email,
> >.
> >I'm sorry if I didn't make it obvious that more text and the patch were
> >following after the full-quote of the original issue description.
> >
> >>Here's the details: [...]
> >
> >Thanks again for your helpful comments; that's appreciated.
> >
> >Here's again my proposed patch.  Note, that the include paths in GCC
> >compiler testing (gcc/testsuite/) are not set up to pick up the
> > include file, so I've manually added a propotype for
> >the __cilkrts_set_param function to the three files.  I can change that,
> >if requested.
> >
> >commit ee7138e451d1f3284d6fa0f61fe517c82db94060
> >Author: Thomas Schwinge 
> >Date:   Mon Sep 29 12:47:34 2014 +0200
> >
> > Audit Cilk Plus tests for CILK_NWORKERS=1.
> >
> > gcc/testsuite/
> > * c-c++-common/cilk-plus/CK/spawning_arg.c (main): Call
> > __cilkrts_set_param to set two workers.
> > * c-c++-common/cilk-plus/CK/steal_check.c (main): Likewise.
> > * g++.dg/cilk-plus/CK/catch_exc.cc (main): Likewise.
> OK.
> Jeff

  -- Ilya


Re: [PATCH][CilkPlus] Allow parenthesized initialization in for-loops

2016-03-25 Thread Ilya Verbin
On Mon, Mar 21, 2016 at 15:58:18 +0100, Jakub Jelinek wrote:
> On Mon, Mar 21, 2016 at 05:45:52PM +0300, Ilya Verbin wrote:
> > www.cilkplus.org/sites/default/files/open_specifications/Intel_Cilk_plus_lang_spec_1.2.htm
> > says:
> >   In C++, the control variable shall be declared and initialized within the
> >   initialization clause of the _Cilk_for loop. The variable shall have 
> > automatic
> >   storage duration. The variable shall be initialized. Initialization may be
> >   explicit, using assignment or constructor syntax, or implicit via a 
> > nontrivial
> >   default constructor.
> > 
> > This patch enables constructor-syntax initialization.
> > Bootstraped and regtested on x86_64-linux.  OK for stage1?
> 
> Does this affect just _Cilk_for or also #pragma simd?

It affects both.

> What about (some_class i { 0 }; some_class < ...; some_class++)
> and similar syntax?

It's allowed, thanks, I missed this in the initial patch.

> The testsuite coverage is insufficient (nothing e.g.
> tests templates or #pragma simd).

Patch is updated.  Is it sufficient now?


gcc/cp/
* parser.c (cp_parser_omp_for_loop_init): Allow constructor syntax in
Cilk Plus for-loop initialization.
gcc/testsuite/
* g++.dg/cilk-plus/CK/for2.cc: New test.
* g++.dg/cilk-plus/for5.C: New test.


diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index cd09de6..e481c0c 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -33284,62 +33284,74 @@ cp_parser_omp_for_loop_init (cp_parser *parser,
 
   if (declarator == cp_error_declarator) 
cp_parser_skip_to_end_of_statement (parser);
-
   else 
{
  tree pushed_scope, auto_node;
+ bool is_cilk, is_class, next_is_semicol, next_is_eq, next_is_op_paren,
+  next_is_op_brace;
 
  decl = start_decl (declarator, _specifiers,
 SD_INITIALIZED, attributes,
 /*prefix_attributes=*/NULL_TREE,
 _scope);
 
+ is_class = CLASS_TYPE_P (TREE_TYPE (decl));
  auto_node = type_uses_auto (TREE_TYPE (decl));
- if (cp_lexer_next_token_is_not (parser->lexer, CPP_EQ))
+ is_cilk = code == CILK_SIMD || code == CILK_FOR;
+ next_is_semicol
+   = cp_lexer_next_token_is (parser->lexer, CPP_SEMICOLON);
+ next_is_op_paren
+   = cp_lexer_next_token_is (parser->lexer, CPP_OPEN_PAREN);
+ next_is_op_brace
+   = cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE);
+ next_is_eq = cp_lexer_next_token_is (parser->lexer, CPP_EQ);
+
+ if (!is_cilk && next_is_op_paren)
{
- if (cp_lexer_next_token_is (parser->lexer, 
- CPP_OPEN_PAREN))
-   {
- if (code != CILK_SIMD && code != CILK_FOR)
-   error ("parenthesized initialization is not allowed in "
-  "OpenMP %<for%> loop");
- else
-   error ("parenthesized initialization is "
-  "not allowed in for-loop");
-   }
- else
-   /* Trigger an error.  */
-   cp_parser_require (parser, CPP_EQ, RT_EQ);
-
+ error ("parenthesized initialization is not allowed in "
+"OpenMP %<for%> loop");
  init = error_mark_node;
  cp_parser_skip_to_end_of_statement (parser);
}
- else if (CLASS_TYPE_P (TREE_TYPE (decl))
-  || type_dependent_expression_p (decl)
-  || auto_node)
+ else if (!is_cilk && !next_is_eq)
+   {
+ /* Trigger an error.  */
+ cp_parser_require (parser, CPP_EQ, RT_EQ);
+ init = error_mark_node;
+ cp_parser_skip_to_end_of_statement (parser);
+   }
+ else if (is_cilk && !(next_is_eq || next_is_op_paren
+   || next_is_op_brace || next_is_semicol))
+   {
+ cp_parser_error (parser, "expected %<=%>, %<(%>, %<{%> or %<;%>");
+ init = error_mark_node;
+ cp_parser_skip_to_end_of_statement (parser);
+   }
+ else if (is_class || type_dependent_expression_p (decl) || auto_node)
{
  bool is_direct_init, is_non_constant_init;
 
- init = cp_parser_initializer (parser,
-   _direct_init,
-   _non_constant_init);
-
+ if (is_cilk && next_is_semicol)
+   init = NULL_TREE;
+ else
+   init = cp_parser_initializer (parser,
+

[PATCH][CilkPlus] Allow parenthesized initialization in for-loops

2016-03-21 Thread Ilya Verbin
Hi!

www.cilkplus.org/sites/default/files/open_specifications/Intel_Cilk_plus_lang_spec_1.2.htm
says:
  In C++, the control variable shall be declared and initialized within the
  initialization clause of the _Cilk_for loop. The variable shall have automatic
  storage duration. The variable shall be initialized. Initialization may be
  explicit, using assignment or constructor syntax, or implicit via a nontrivial
  default constructor.

This patch enables constructor-syntax initialization.
Bootstraped and regtested on x86_64-linux.  OK for stage1?


gcc/cp/
* parser.c (cp_parser_omp_for_loop_init): Allow parenthesized
initialization in Cilk Plus for-loops.
gcc/testsuite/
* g++.dg/cilk-plus/CK/for2.cc: New test.


diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index d58b5aa..49b3791 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -33284,54 +33284,68 @@ cp_parser_omp_for_loop_init (cp_parser *parser,
 
   if (declarator == cp_error_declarator) 
cp_parser_skip_to_end_of_statement (parser);
-
   else 
{
  tree pushed_scope, auto_node;
+ bool is_cilk, is_class, next_is_semicol, next_is_eq, next_is_open;
 
  decl = start_decl (declarator, _specifiers,
 SD_INITIALIZED, attributes,
 /*prefix_attributes=*/NULL_TREE,
 _scope);
 
+ is_class = CLASS_TYPE_P (TREE_TYPE (decl));
  auto_node = type_uses_auto (TREE_TYPE (decl));
- if (cp_lexer_next_token_is_not (parser->lexer, CPP_EQ))
-   {
- if (cp_lexer_next_token_is (parser->lexer, 
- CPP_OPEN_PAREN))
-   {
- if (code != CILK_SIMD && code != CILK_FOR)
-   error ("parenthesized initialization is not allowed in "
-  "OpenMP % loop");
- else
-   error ("parenthesized initialization is "
-  "not allowed in for-loop");
-   }
- else
-   /* Trigger an error.  */
-   cp_parser_require (parser, CPP_EQ, RT_EQ);
+ is_cilk = code == CILK_SIMD || code == CILK_FOR;
+ next_is_semicol
+   = cp_lexer_next_token_is (parser->lexer, CPP_SEMICOLON);
+ next_is_eq = cp_lexer_next_token_is (parser->lexer, CPP_EQ);
+ next_is_open = cp_lexer_next_token_is (parser->lexer, CPP_OPEN_PAREN);
 
+ if (!is_cilk && next_is_open)
+   {
+ error ("parenthesized initialization is not allowed in "
+"OpenMP % loop");
+ init = error_mark_node;
+ cp_parser_skip_to_end_of_statement (parser);
+   }
+ else if (!is_cilk && !next_is_eq)
+   {
+ /* Trigger an error.  */
+ cp_parser_require (parser, CPP_EQ, RT_EQ);
  init = error_mark_node;
  cp_parser_skip_to_end_of_statement (parser);
}
- else if (CLASS_TYPE_P (TREE_TYPE (decl))
-  || type_dependent_expression_p (decl)
-  || auto_node)
+ else if (is_cilk && !(next_is_eq || next_is_open || next_is_semicol))
+   {
+ cp_parser_error (parser, "expected %<=%>, %<(%> or %<;%>");
+ init = error_mark_node;
+ cp_parser_skip_to_end_of_statement (parser);
+   }
+ else if (is_cilk && !is_class && next_is_open)
+   {
+ cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN);
+ init = cp_parser_assignment_expression (parser);
+ cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN);
+ goto non_class;
+   }
+ else if (is_class || type_dependent_expression_p (decl) || auto_node)
{
  bool is_direct_init, is_non_constant_init;
 
- init = cp_parser_initializer (parser,
-   _direct_init,
-   _non_constant_init);
-
+ if (is_cilk && next_is_semicol)
+   init = NULL_TREE;
+ else
+   init = cp_parser_initializer (parser,
+ _direct_init,
+ _non_constant_init);
  if (auto_node)
{
  TREE_TYPE (decl)
= do_auto_deduction (TREE_TYPE (decl), init,
 auto_node);
 
- if (!CLASS_TYPE_P (TREE_TYPE (decl))
- && !type_dependent_expression_p (decl))
+ if (!is_class && !type_dependent_expression_p (decl))
goto non_class;
}
  
@@ -9,7 +33353,7 @@ cp_parser_omp_for_loop_init (cp_parser *parser,
  asm_specification,
   

Re: [PATCH][RFC][Offloading] Fix PR68463

2016-02-24 Thread Ilya Verbin
On Mon, Feb 22, 2016 at 16:13:07 +0100, Thomas Schwinge wrote:
> (..., and similar for others.)  The if-exists spec function only works
> for absolute paths (I have not researched, why?), so it won't locate the
> files for relative -Bbuild-gcc/[...] prefixes, and linking will fail:
> 
> /tmp/ccGajPD4.crtoffloadtable.o:(.rodata+0x0): undefined reference to 
> `__offload_func_table'
> /tmp/ccGajPD4.crtoffloadtable.o:(.rodata+0x8): undefined reference to 
> `__offload_funcs_end'
> /tmp/ccGajPD4.crtoffloadtable.o:(.rodata+0x10): undefined reference to 
> `__offload_var_table'
> /tmp/ccGajPD4.crtoffloadtable.o:(.rodata+0x18): undefined reference to 
> `__offload_vars_end'
> 
> If I use the absolute -B$PWD/build-gcc/[...], it works.  (But there is no
> requirement for -B prefixes to be absolute, as far as I know.)  Why not
> make it a hard error, though, if these files are missing?  Can we use
> something like (untested pseudo-patch):
> 
> +#ifdef ENABLE_OFFLOADING
> +# define CRTOFFLOADBEGIN "%{fopenacc|fopenmp:%:crtoffloadbegin%O%s}"
> +#else
> +# define CRTOFFLOADBEGIN ""
> +#endif
> 
> @@ -49,14 +49,16 @@ see the files COPYING3 and COPYING.RUNTIME 
> respectively.  If not, see
> %{" NO_PIE_SPEC ":crtbegin.o%s}} \
> %{fvtable-verify=none:%s; \
>   fvtable-verify=preinit:vtv_start_preinit.o%s; \
> - fvtable-verify=std:vtv_start.o%s}"
> + fvtable-verify=std:vtv_start.o%s} \
> +   " CRTOFFLOADBEGIN ")}"

Fixed.  Actually ENABLE_OFFLOADING is always defined (to 0 or to 1).

> To the casual reader, skipping the first offload_files looks like a
> off-by-one error, so I suggest you add a comment "Skip the dummy item at
> the start of the list.", or similar.

Done.

> Ilya, then please remove
> libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims-2.c as part of
> your patch, unless Tom thinks it should be changed to a -flto test, but
> without -fno-use-linker-plugin?

Done.
Here is a follow up patch.  OK for trunk?  Bootstrapped and regtested.
Unfortunately I'm unable to run bootstrap-lto:
libdecnumber/dpd/decimal32.c:53:0: error: type of ‘decDigitsFromDPD’ does not 
match original declaration [-Werror=lto-type-mismatch]
[...]


diff --git a/gcc/config/gnu-user.h b/gcc/config/gnu-user.h
index 2fdb63c..b0bf40a 100644
--- a/gcc/config/gnu-user.h
+++ b/gcc/config/gnu-user.h
@@ -35,6 +35,14 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 #undef ASM_APP_OFF
 #define ASM_APP_OFF "#NO_APP\n"
 
+#if ENABLE_OFFLOADING == 1
+#define CRTOFFLOADBEGIN "%{fopenacc|fopenmp:crtoffloadbegin%O%s}"
+#define CRTOFFLOADEND "%{fopenacc|fopenmp:crtoffloadend%O%s}"
+#else
+#define CRTOFFLOADBEGIN ""
+#define CRTOFFLOADEND ""
+#endif
+
 /* Provide a STARTFILE_SPEC appropriate for GNU userspace.  Here we add
the GNU userspace magical crtbegin.o file (see crtstuff.c) which
provides part of the support for getting C++ file-scope static
@@ -50,7 +58,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
%{fvtable-verify=none:%s; \
  fvtable-verify=preinit:vtv_start_preinit.o%s; \
  fvtable-verify=std:vtv_start.o%s} \
-   %{fopenacc|fopenmp:%:if-exists(crtoffloadbegin%O%s)}"
+   " CRTOFFLOADBEGIN
 #else
 #define GNU_USER_TARGET_STARTFILE_SPEC \
   "%{!shared: %{pg|p|profile:gcrt1.o%s;:crt1.o%s}} \
@@ -58,7 +66,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
%{fvtable-verify=none:%s; \
  fvtable-verify=preinit:vtv_start_preinit.o%s; \
  fvtable-verify=std:vtv_start.o%s} \
-   %{fopenacc|fopenmp:%:if-exists(crtoffloadbegin%O%s)}"
+   " CRTOFFLOADBEGIN
 #endif
 #undef  STARTFILE_SPEC
 #define STARTFILE_SPEC GNU_USER_TARGET_STARTFILE_SPEC
@@ -76,14 +84,14 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
  fvtable-verify=std:vtv_end.o%s} \
%{shared:crtendS.o%s;: %{" PIE_SPEC ":crtendS.o%s} \
%{" NO_PIE_SPEC ":crtend.o%s}} crtn.o%s \
-   %{fopenacc|fopenmp:%:if-exists(crtoffloadend%O%s)}"
+   " CRTOFFLOADEND
 #else
 #define GNU_USER_TARGET_ENDFILE_SPEC \
   "%{fvtable-verify=none:%s; \
  fvtable-verify=preinit:vtv_end_preinit.o%s; \
  fvtable-verify=std:vtv_end.o%s} \
%{shared|pie:crtendS.o%s;:crtend.o%s} crtn.o%s \
-   %{fopenacc|fopenmp:%:if-exists(crtoffloadend%O%s)}"
+   " CRTOFFLOADEND
 #endif
 #undef  ENDFILE_SPEC
 #define ENDFILE_SPEC GNU_USER_TARGET_ENDFILE_SPEC
diff --git a/libgcc/offloadstuff.c b/libgcc/offloadstuff.c
index a4ea3ac..4ab6397 100644
--- a/libgcc/offloadstuff.c
+++ b/libgcc/offloadstuff.c
@@ -40,7 +40,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 #include "tm.h"
 #include "libgcc_tm.h"
 
-#if defined(HAVE_GAS_HIDDEN) && defined(ENABLE_OFFLOADING)
+#if defined(HAVE_GAS_HIDDEN) && ENABLE_OFFLOADING == 1
 
 #define OFFLOAD_FUNC_TABLE_SECTION_NAME ".gnu.offload_funcs"
 #define OFFLOAD_VAR_TABLE_SECTION_NAME ".gnu.offload_vars"
diff --git 

Re: [PATCH][RFC][Offloading] Fix PR68463

2016-02-24 Thread Ilya Verbin
On Wed, Feb 24, 2016 at 17:13:35 +0100, Thomas Schwinge wrote:
> On Tue, 23 Feb 2016 08:37:07 +0100, Tom de Vries <tom_devr...@mentor.com> 
> wrote:
> > On 22/02/16 19:07, Ilya Verbin wrote:
> > > 2016-02-22 18:13 GMT+03:00 Thomas Schwinge<tho...@codesourcery.com>:
> > >> >On Sat, 20 Feb 2016 13:54:20 +0300, Ilya Verbin<iver...@gmail.com>  
> > >> >wrote:
> > >>> >>On Fri, Feb 19, 2016 at 15:53:08 +0100, Jakub Jelinek wrote:
> > >>>> >> >On Wed, Feb 10, 2016 at 08:19:34PM +0300, Ilya Verbin wrote:
> > >>>>> >> > >This patch adds crtoffload{begin,end}.o to all -fopenmp 
> > >>>>> >> > >programs, if they exist.
> 
> > >>> >>Thomas, could you please test it using nvptx
> > >> >
> > >> >It mostly;-)  works.  With nvptx offloading enabled (which you don't
> > >> >have, do you?), I'm seeing one test case regress:
> > >> >
> > >> > [-PASS:-]{+FAIL:+} 
> > >> > libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims-2.c 
> > >> > -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0  (test for errors, line 
> > >> > 9)
> > >> > [-PASS:-]{+FAIL:+} 
> > >> > libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims-2.c 
> > >> > -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0  (test for errors, line 
> > >> > 13)
> > >> > PASS: 
> > >> > libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims-2.c 
> > >> > -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors)
> > >> > [-PASS:-]{+FAIL:+} 
> > >> > libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims-2.c 
> > >> > -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
> > >> >
> > >> >(Same for C++.)  That testcase, just recently added by Tom in r233237
> > >> >"Handle -fdiagnostics-color in lto", specifies 'dg-additional-options
> > >> >"-flto -fno-use-linker-plugin"'.  Is that now an unsupported
> > >> >combination/configuration?  (I have not yet looked in detail, but it
> > >> >appears as if the offloading compilers are no longer being run for
> > >> >-fno-use-linker-plugin.)
> > > Yes, it's really hard to fix the "lto + non-lto objects" issue for
> > > no-use-linker-plugin LTO path. In this patch lto-plugin prepares a
> > > list of objects files with offloading and passes it to lto-wrapper, so
> > > I believe we should consider offloading without lto-plugin as
> > > unsupported. I'll update wiki when the patch will be committed.
> 
> Aha, I see.  I guess there's no point in keeping offloading supported for
> the -fno-lto (default) with -fno-use-linker-plugin configuration?
> 
> Ilya, then please remove
> libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims-2.c as part of
> your patch, unless Tom thinks it should be changed to a -flto test, but
> without -fno-use-linker-plugin?

OK.

> > Shouldn't we error (or at least warn) then if we compile a file 
> > containing an offload construct with fopenacc/fopenmp and 
> > -fno-use-linker-plugin?
> 
> Yes, that makes sense to me, too.  (Note that, as I understand it,
> -fno-use-linker-plugin may also be the default for certain GCC
> configurations...)  Aside from spec stuff in gcc/gcc.c relating to
> LINK_PLUGIN_SPEC, I see there's some code in
> gcc/gcc.c:driver::maybe_run_linker evaluating the three possible values
> of HAVE_LTO_PLUGIN, but I have not yet thought about how and where to
> conditionalize the diagnostic if attempting to do offloading in an
> unsupported (-fno-use-linker-plugin) configuration.

To print this error someone has to detect that at least one object contains
offload sections, only linker plugin and lto-wrapper can do it.  But if linker
plugin is absent, the lto-wrapper have to open all objects, scan for all
sections, etc.  Looks like too much overhead for a single diagnostic.

  -- Ilya


Re: [PATCH][RFC][Offloading] Fix PR68463

2016-02-22 Thread Ilya Verbin
2016-02-22 18:13 GMT+03:00 Thomas Schwinge <tho...@codesourcery.com>:
> On Sat, 20 Feb 2016 13:54:20 +0300, Ilya Verbin <iver...@gmail.com> wrote:
>> On Fri, Feb 19, 2016 at 15:53:08 +0100, Jakub Jelinek wrote:
>> > On Wed, Feb 10, 2016 at 08:19:34PM +0300, Ilya Verbin wrote:
>> > > This patch adds crtoffload{begin,end}.o to all -fopenmp programs, if 
>> > > they exist.
>> > > I couldn't think of a better solution...
>> > > Tested using the testcase from the previous mail, e.g.:
>> > >
>> > > $ gcc -DNUM=1 -c -fopenmp test.c -o obj1.o
>> > > $ gcc -DNUM=2 -c -fopenmp test.c -o obj2.o
>> > > $ gcc -DNUM=3 -c -fopenmp test.c -o obj3.o
>> > > $ gcc -DNUM=4 -c -fopenmp test.c -o obj4.o -flto
>> > > $ gcc -DNUM=5 -c -fopenmp test.c -o obj5.o
>> > > $ gcc -DNUM=6 -c -fopenmp test.c -o obj6.o -flto
>> > > $ gcc -DNUM=7 -c -fopenmp test.c -o obj7.o
>> > > $ gcc-ar -cvq libtest.a obj3.o obj4.o obj5.o
>> > > $ gcc -fopenmp main.c obj1.o obj2.o libtest.a obj6.o obj7.o
>> > >
>> > > And other combinations.
>
>> Thomas, could you please test it using nvptx
>
> It mostly ;-) works.  With nvptx offloading enabled (which you don't
> have, do you?), I'm seeing one test case regress:
>
> [-PASS:-]{+FAIL:+} 
> libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims-2.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0  (test for errors, line 9)
> [-PASS:-]{+FAIL:+} 
> libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims-2.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0  (test for errors, line 13)
> PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims-2.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors)
> [-PASS:-]{+FAIL:+} 
> libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims-2.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
>
> (Same for C++.)  That testcase, just recently added by Tom in r233237
> "Handle -fdiagnostics-color in lto", specifies 'dg-additional-options
> "-flto -fno-use-linker-plugin"'.  Is that now an unsupported
> combination/configuration?  (I have not yet looked in detail, but it
> appears as if the offloading compilers are no longer being run for
> -fno-use-linker-plugin.)

Yes, it's really hard to fix the "lto + non-lto objects" issue for
no-use-linker-plugin LTO path. In this patch lto-plugin prepares a
list of objects files with offloading and passes it to lto-wrapper, so
I believe we should consider offloading without lto-plugin as
unsupported. I'll update wiki when the patch will be committed.

>> including the testcase with static
>> libraries?
>
> Works in my manual testing if I work around the following issue:
>
>> --- a/gcc/config/gnu-user.h
>> +++ b/gcc/config/gnu-user.h
>> @@ -49,14 +49,16 @@ see the files COPYING3 and COPYING.RUNTIME respectively. 
>>  If not, see
>> %{" NO_PIE_SPEC ":crtbegin.o%s}} \
>> %{fvtable-verify=none:%s; \
>>   fvtable-verify=preinit:vtv_start_preinit.o%s; \
>> - fvtable-verify=std:vtv_start.o%s}"
>> + fvtable-verify=std:vtv_start.o%s} \
>> +   %{fopenacc|fopenmp:%:if-exists(crtoffloadbegin%O%s)}"
>
> (..., and similar for others.)  The if-exists spec function only works
> for absolute paths (I have not researched, why?), so it won't locate the
> files for relative -Bbuild-gcc/[...] prefixes, and linking will fail:
>
> /tmp/ccGajPD4.crtoffloadtable.o:(.rodata+0x0): undefined reference to 
> `__offload_func_table'
> /tmp/ccGajPD4.crtoffloadtable.o:(.rodata+0x8): undefined reference to 
> `__offload_funcs_end'
> /tmp/ccGajPD4.crtoffloadtable.o:(.rodata+0x10): undefined reference to 
> `__offload_var_table'
> /tmp/ccGajPD4.crtoffloadtable.o:(.rodata+0x18): undefined reference to 
> `__offload_vars_end'
>
> If I use the absolute -B$PWD/build-gcc/[...], it works.  (But there is no
> requirement for -B prefixes to be absolute, as far as I know.)  Why not
> make it a hard error, though, if these files are missing?  Can we use
> something like (untested pseudo-patch):
>
> +#ifdef ENABLE_OFFLOADING
> +# define CRTOFFLOADBEGIN "%{fopenacc|fopenmp:%:crtoffloadbegin%O%s}"
> +#else
> +# define CRTOFFLOADBEGIN ""
> +#endif
>
> @@ -49,14 +49,16 @@ see the files COPYING3 and COPYING.RUNTIME 
> respectively.  If not, see
>   %{" NO_PIE_SPEC ":crtbegin.o%s}} \
> %{fvtable-verify=none:%s; \
>   fvtable-verify=preinit:vtv_start_preinit.o%s; \
> - fvtable

Re: [PATCH][RFC][Offloading] Fix PR68463

2016-02-20 Thread Ilya Verbin
On Fri, Feb 19, 2016 at 15:53:08 +0100, Jakub Jelinek wrote:
> On Wed, Feb 10, 2016 at 08:19:34PM +0300, Ilya Verbin wrote:
> > This patch adds crtoffload{begin,end}.o to all -fopenmp programs, if they 
> > exist.
> > I couldn't think of a better solution...
> > Tested using the testcase from the previous mail, e.g.:
> > 
> > $ gcc -DNUM=1 -c -fopenmp test.c -o obj1.o
> > $ gcc -DNUM=2 -c -fopenmp test.c -o obj2.o
> > $ gcc -DNUM=3 -c -fopenmp test.c -o obj3.o
> > $ gcc -DNUM=4 -c -fopenmp test.c -o obj4.o -flto
> > $ gcc -DNUM=5 -c -fopenmp test.c -o obj5.o
> > $ gcc -DNUM=6 -c -fopenmp test.c -o obj6.o -flto
> > $ gcc -DNUM=7 -c -fopenmp test.c -o obj7.o
> > $ gcc-ar -cvq libtest.a obj3.o obj4.o obj5.o
> > $ gcc -fopenmp main.c obj1.o obj2.o libtest.a obj6.o obj7.o
> > 
> > And other combinations.
> 
> Looking at this, I think I have no problem with crtoffloadbegin.o being
> included in all -fopenmp/-fopenacc linked programs/shared libraries,
> that just defines the symbols and nothing else.
> I have no problem with the
> __offload_funcs_end/__offload_vars_end part of crtoffloadend.o being
> included too.
> But, I really don't like __OFFLOAD_TABLE__ being added to all programs, that
> wastes real space in data (rodata or relro?) section, and dynamic
> relocations.
> So, perhaps, can we split offloadstuff.c into 3 objects instead of 2,
> crtoffload{begin,end,table}.o*, where the last one would be what
> defines __OFFLOAD_TABLE__, and add the last one only by the linker
> plugin/lto-wrapper/whatever, if any input objects had any offloading stuff
> in it?

Done.  Bootstrapped and regtested, lto-bootstrap in progress.

Thomas, could you please test it using nvptx, including the testcase with static
libraries?

Could this patch be considered for stage4?  On the one hand, this is not a
regression.  On the other hand, it fixes quite serious issues, and it shouldn't
affect non-offloading configurations.


gcc/
PR driver/68463
* config/gnu-user.h (GNU_USER_TARGET_STARTFILE_SPEC): Add
crtoffloadbegin.o for -fopenacc/-fopenmp if it exists.
(GNU_USER_TARGET_ENDFILE_SPEC): Add crtoffloadend.o for
-fopenacc/-fopenmp if it exists.
* lto-wrapper.c (offloadbegin, offloadend): Remove static vars.
(offload_objects_file_name): New static var.
(tool_cleanup): Remove offload_objects_file_name file.
(find_offloadbeginend): Replace with ...
(find_crtoffloadtable): ... this.
(run_gcc): Remove offload_argc and offload_argv.
Get offload_objects_file_name from -foffload-objects=... option.
Read names of object files with offload from this file, pass them to
compile_images_for_offload_targets.  Don't call find_offloadbeginend and
don't pass offloadbegin and offloadend to the linker.  Don't pass
offload non-LTO files to the linker, because now they're not claimed.
libgcc/
PR driver/68463
* Makefile.in (crtoffloadtable$(objext)): New rule.
* configure.ac (extra_parts): Add crtoffloadtable$(objext) if
enable_offload_targets is not empty.
* configure: Regenerate.
* offloadstuff.c: Move __OFFLOAD_TABLE__ from crtoffloadend to
crtoffloadtable.
lto-plugin/
PR driver/68463
* lto-plugin.c (struct plugin_offload_file): New.
(offload_files): Change type.
(offload_files_last, offload_files_last_obj): New.
(offload_files_last_lto): New.
(free_2): Adjust accordingly.
(all_symbols_read_handler): Don't add offload files to lto_arg_ptr.
Don't call free_1 for offload_files.  Write names of object files with
offloading to the temporary file.  Add new option to lto_arg_ptr.
(claim_file_handler): Don't claim file if it contains offload sections
without LTO sections.  If it contains offload sections, add to the list.


diff --git a/gcc/config/gnu-user.h b/gcc/config/gnu-user.h
index 2f1bbcc..2fdb63c 100644
--- a/gcc/config/gnu-user.h
+++ b/gcc/config/gnu-user.h
@@ -49,14 +49,16 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
  %{" NO_PIE_SPEC ":crtbegin.o%s}} \
%{fvtable-verify=none:%s; \
  fvtable-verify=preinit:vtv_start_preinit.o%s; \
- fvtable-verify=std:vtv_start.o%s}"
+ fvtable-verify=std:vtv_start.o%s} \
+   %{fopenacc|fopenmp:%:if-exists(crtoffloadbegin%O%s)}"
 #else
 #define GNU_USER_TARGET_STARTFILE_SPEC \
   "%{!shared: %{pg|p|profile:gcrt1.o%s;:crt1.o%s}} \
crti.o%s %{static:crtbeginT.o%s;shared|pie:crtbeginS.o%s;:crtbegin.o%s} \
%{fvtable-verify=none:%s; \
  fvtable-verify=preinit:vtv_start_preinit.o%s; \
- fvtable-verify=std:vtv_start.o%s}"
+ fvtable-verify=std:vtv_start.o%s} \
+   %{fopenacc|fopenmp:%:if-exists(crtoffloadb

Re: [PATCH 4/n] OpenMP 4.0 offloading infrastructure: lto-wrapper

2016-02-19 Thread Ilya Verbin
On Fri, Feb 19, 2016 at 20:41:58 +0100, Thomas Schwinge wrote:
> Hi!
> 
> On Thu, 2 Oct 2014 19:14:57 +0400, Ilya Verbin <iver...@gmail.com> wrote:
> > With this patch lto-wrapper performs invocation of mkoffload tool for each
> > offload target.  This tool [...]
> > will compile IR from .gnu.offload_lto_* sections into offload
> > target code and embed the resultant code (offload image) into the new host's
> > object file.
> 
> Consider the following scenario:
> 
> $ cat < CSTS-214-acc.c
> int acc (void)
> {
>   int a;
> 
> #pragma acc parallel num_gangs (1) copyout (a)
>   a = 100;
> 
>   return a;
> }
> $ cat < CSTS-214-test.c
> extern int acc (void);
> 
> int main (void)
> {
>   if (acc () != 100)
> __builtin_abort ();
>   
>   return 0;
> }
> 
> Compile these two files as follows:
> 
> $ [GCC] -fopenacc -c CSTS-214-acc.c
> $ x86_64-linux-gnu-ar -cr CSTS-214-acc.a CSTS-214-acc.o
> $ [GCC] -fopenacc CSTS-214-test.c CSTS-214-acc.a
> 
> The last step will fail -- with incomprehensible diagnostics, ;-) as so
> often when offloading fails...  Here's what's going on: the
> LTO/offloading machinery correctly identifies that it needs to process
> the CSTS-214-acc.c:acc function, present in the CSTS-214-acc.a archive
> file at a certain offset, and it "encodes" that as follows:
> CSTS-214-acc.a@0x9e (see lto-plugin/lto-plugin.c:claim_file_handler, the
> "file->offset != 0" code right at the beginning).  This makes its way
> down through here:
> 
> > --- a/gcc/lto-wrapper.c
> > +++ b/gcc/lto-wrapper.c
> 
> > +/* Copy a file from SRC to DEST.  */
> > +
> > +static void
> > +copy_file (const char *dest, const char *src)
> > +{
> > +  [...]
> > +}
> 
> > @@ -624,6 +852,54 @@ run_gcc (unsigned argc, char *argv[])
> 
> > +  /* If object files contain offload sections, but do not contain LTO 
> > sections,
> > + then there is no need to perform a link-time recompilation, i.e.
> > + lto-wrapper is used only for a compilation of offload images.  */
> > +  if (have_offload && !have_lto)
> > +{
> > +  for (i = 1; i < argc; ++i)
> > +   if ([...])
> > + {
> > +   char *out_file;
> > +   /* Can be ".o" or ".so".  */
> > +   char *ext = strrchr (argv[i], '.');
> > +   if (ext == NULL)
> > + out_file = make_temp_file ("");
> > +   else
> > + out_file = make_temp_file (ext);
> > +   /* The linker will delete the files we give it, so make copies.  */
> > +   copy_file (out_file, argv[i]);
> > +   printf ("%s\n", out_file);
> > + }
> > +[...]
> > +  goto finish;
> > +}
> > +
> >if (lto_mode == LTO_MODE_LTO)
> >  {
> >flto_out = make_temp_file (".lto.o");
> > @@ -850,6 +1126,10 @@ cont:
> >obstack_free (_obstack, NULL);
> >  }
> >  
> > + finish:
> > +  if (offloadend)
> > +printf ("%s\n", offloadend);
> > +
> >obstack_free (_obstack, NULL);
> >  }
> 
> When we hit this, for argv "CSTS-214-acc.a@0x9e", the copy_file call will
> fail -- there is no "CSTS-214-acc.a@0x9e" file to copy.  If we strip off
> the "@0x[...]" suffix (but still printf the filename including the
> suffix), then things work.  I copied that bit of code from earlier in
> this function, where the same archive offset handling needs to be done.
> Probably that code should be refactored a bit.
> 
> Also, I wonder if the "ext == NULL" case can really happen, and needs to
> be handled as done in the code cited above, or if that can be simplified?
> (Not yet tested that.)
> 
> Will something like the following be OK to fix this issue, or is that
> something "that should not happen", should be fixed differently?
> 
> --- gcc/lto-wrapper.c
> +++ gcc/lto-wrapper.c
> @@ -1161,15 +1161,31 @@ run_gcc (unsigned argc, char *argv[])
>   && strncmp (argv[i], "-flinker-output=",
>   sizeof ("-flinker-output=") - 1) != 0)
> {
> + char *p;
> + off_t file_offset = 0;
> + long loffset;
> + int consumed;
> + char *filename = argv[i];
> +
> + if ((p = strrchr (argv[i], '@'))
> + && p != argv[i] 
> + && sscanf (p, "@%li

Re: Partial Offloading (was: [hsa merge 07/10] IPA-HSA pass)

2016-02-17 Thread Ilya Verbin
On Thu, Jan 28, 2016 at 12:36:19 +0100, Thomas Schwinge wrote:
> I made an attempt to capture the recent discussion (plus my own
> ideas/understanding) in this new section:
> .  Please
> change/extend, as required.

Thanks for summarizing this.


I'm not very happy how -foffload=disable works in GCC 6, here is a testcase:

int main ()
{
  int x = 10;
  #pragma omp target data map (from: x)
#pragma omp target map (alloc: x)
  x = 20;
  if (x != 10 && x != 20)
__builtin_abort ();
}

On the system with non-shared accelerator it will abort, because "#pragma omp
target data" behaves like offloading is enabled, but "#pragma omp target" runs
on the host.  As the result, at the end of the *target data* region, it tries to
receive x from target and receives 0, or crashes.

We can forbid -foffload=disable option, but I think it's very useful, e.g. for
comparing performance of host vs. accelerator using the same compiler, etc.
Or if the system contains 2 different accelerators, someone might want to
compile only for the first, but libgomp will load 2 plugins, and the program
will crash (instead of doing fallback) if it will try to use the second device.

So, maybe we still need something like this patch?
https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01033.html

  -- Ilya


Re: [PATCH][CilkPlus] Fix PR69363

2016-02-17 Thread Ilya Verbin
On Wed, Feb 17, 2016 at 16:28:34 +0100, Marek Polacek wrote:
> On Wed, Feb 17, 2016 at 04:14:22PM +0100, Jakub Jelinek wrote:
> > On Wed, Feb 17, 2016 at 04:11:44PM +0100, Marek Polacek wrote:
> > > On Wed, Feb 17, 2016 at 06:08:14PM +0300, Ilya Verbin wrote:
> > > > > This line is too long.  But you could have just done
> > > > 
> > > > My editor shows exactly 80 chars.
> > > 
> > > The maximum is 79.
> > 
> > Well, check_GNU_style.sh complains just about one line, and then
> > a prototype.
> > 
> > Lines should not exceed 80 characters.
> > 193:+extern tree finish_omp_clauses  (tree, bool, bool = 
> > false, bool = false);
> > 252:+  error ("linear clause applied to non-integral 
> > non-pointer "
> 
> Maybe it should be fixed with this then.  Because
> <https://www.gnu.org/prep/standards/standards.html#Formatting> says
> "Please keep the length of source lines to 79 characters or less, for maximum
> readability in the widest range of environments."

https://gcc.gnu.org/codingconventions.html#Line says 80.

  -- Ilya


Re: [PATCH][CilkPlus] Fix PR69363

2016-02-17 Thread Ilya Verbin
On Wed, Feb 17, 2016 at 15:46:00 +0100, Jakub Jelinek wrote:
> On Wed, Feb 17, 2016 at 05:32:58PM +0300, Ilya Verbin wrote:
> > + && !SCALAR_FLOAT_TYPE_P (TREE_TYPE (t))
> > + && TREE_CODE (TREE_TYPE (t)) != POINTER_TYPE)
> > +   {
> > + error_at (OMP_CLAUSE_LOCATION (c),
> > +   "linear clause applied to non-integral, "
> > +   "non-floating, non-pointer variable with type %qT",
> > +   TREE_TYPE (t));
> > + remove = true;
> > + break;
> > +   }
> > +   }
> > + else
> > +   {
> > + if (!INTEGRAL_TYPE_P (TREE_TYPE (t))
> > + && TREE_CODE (TREE_TYPE (t)) != POINTER_TYPE)
> > +   {
> > + error_at (OMP_CLAUSE_LOCATION (c),
> > +   "linear clause applied to non-integral non-pointer "
> 
> This line is too long.  But you could have just done

My editor shows exactly 80 chars.

> > --- a/gcc/cp/semantics.c
> > +++ b/gcc/cp/semantics.c
> 
> > + error ("linear clause applied to non-integral, "
> > +"non-floating, non-pointer variable with %qT type",
> 
> Again too long line, that needs to be wrapped more.

OK, here is 81.

> > +TREE_TYPE (t));
> > + remove = true;
> > + break;
> > +   }
> > +   }
> > + else
> > +   {
> > + if (!INTEGRAL_TYPE_P (type)
> > + && TREE_CODE (type) != POINTER_TYPE)
> > +   {
> > + error ("linear clause applied to non-integral non-pointer 
> > "
> > +"variable with %qT type", TREE_TYPE (t));
> > + remove = true;
> > + break;
> 
> And this can be done like I've hinted above.

OK, here is 81.

  -- Ilya


[PATCH][CilkPlus] Fix PR69363

2016-02-17 Thread Ilya Verbin
Hi!

This patch fixes 
Bootstrap and make check passed.  OK for... stage 1?


gcc/c-family/
PR c++/69363
* c-cilkplus.c (c_finish_cilk_clauses): Remove function.
* c-common.h (c_finish_cilk_clauses): Remove declaration.
gcc/c/
PR c++/69363
* c-parser.c (c_parser_cilk_all_clauses): Use c_finish_omp_clauses
instead of c_finish_cilk_clauses.
* c-tree.h (c_finish_omp_clauses): Add new default argument.
* c-typeck.c (c_finish_omp_clauses): Add new argument.  Allow
floating-point variables in the linear clause for Cilk Plus.
gcc/cp/
PR c++/69363
* cp-tree.h (finish_omp_clauses): Add new default argument.
* parser.c (cp_parser_cilk_simd_all_clauses): Use finish_omp_clauses
instead of c_finish_cilk_clauses.
* semantics.c (finish_omp_clauses): Add new argument.  Allow
floating-point variables in the linear clause for Cilk Plus.
gcc/testsuite/
PR c++/69363
* c-c++-common/cilk-plus/PS/clauses3.c: Adjust dg-error string.
* c-c++-common/cilk-plus/PS/clauses4.c: New test.
* c-c++-common/cilk-plus/PS/pr69363.c: New test.


diff --git a/gcc/c-family/c-cilkplus.c b/gcc/c-family/c-cilkplus.c
index 3e7902fd..9f1f364 100644
--- a/gcc/c-family/c-cilkplus.c
+++ b/gcc/c-family/c-cilkplus.c
@@ -41,56 +41,6 @@ c_check_cilk_loop (location_t loc, tree decl)
   return true;
 }
 
-/* Validate and emit code for <#pragma simd> clauses.  */
-
-tree
-c_finish_cilk_clauses (tree clauses)
-{
-  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
-{
-  tree prev = clauses;
-
-  /* If a variable appears in a linear clause it cannot appear in
-any other OMP clause.  */
-  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_LINEAR)
-   for (tree c2 = clauses; c2; c2 = OMP_CLAUSE_CHAIN (c2))
- {
-   if (c == c2)
- continue;
-   enum omp_clause_code code = OMP_CLAUSE_CODE (c2);
-
-   switch (code)
- {
- case OMP_CLAUSE_LINEAR:
- case OMP_CLAUSE_PRIVATE:
- case OMP_CLAUSE_FIRSTPRIVATE:
- case OMP_CLAUSE_LASTPRIVATE:
- case OMP_CLAUSE_REDUCTION:
-   break;
-
- case OMP_CLAUSE_SAFELEN:
-   goto next;
-
- default:
-   gcc_unreachable ();
- }
-
-   if (OMP_CLAUSE_DECL (c) == OMP_CLAUSE_DECL (c2))
- {
-   error_at (OMP_CLAUSE_LOCATION (c2),
- "variable appears in more than one clause");
-   inform (OMP_CLAUSE_LOCATION (c),
-   "other clause defined here");
-   // Remove problematic clauses.
-   OMP_CLAUSE_CHAIN (prev) = OMP_CLAUSE_CHAIN (c2);
- }
- next:
-   prev = c2;
- }
-}
-  return clauses;
-}
-
 /* Calculate number of iterations of CILK_FOR.  */
 
 tree
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index fa3746c..663e457 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1369,7 +1369,6 @@ extern enum stv_conv scalar_to_vector (location_t loc, 
enum tree_code code,
   tree op0, tree op1, bool);
 
 /* In c-cilkplus.c  */
-extern tree c_finish_cilk_clauses (tree);
 extern tree c_validate_cilk_plus_loop (tree *, int *, void *);
 extern bool c_check_cilk_loop (location_t, tree);
 
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 7a27244..4770f45d 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -17427,7 +17427,7 @@ c_parser_cilk_all_clauses (c_parser *parser)
 
  saw_error:
   c_parser_skip_to_pragma_eol (parser);
-  return c_finish_cilk_clauses (clauses);
+  return c_finish_omp_clauses (clauses, false, false, true);
 }
 
 /* This function helps parse the grainsize pragma for a _Cilk_for statement.
diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
index 96ab049..8bfd256 100644
--- a/gcc/c/c-tree.h
+++ b/gcc/c/c-tree.h
@@ -661,7 +661,7 @@ extern tree c_begin_omp_task (void);
 extern tree c_finish_omp_task (location_t, tree, tree);
 extern void c_finish_omp_cancel (location_t, tree);
 extern void c_finish_omp_cancellation_point (location_t, tree);
-extern tree c_finish_omp_clauses (tree, bool, bool = false);
+extern tree c_finish_omp_clauses (tree, bool, bool = false, bool = false);
 extern tree c_build_va_arg (location_t, tree, location_t, tree);
 extern tree c_finish_transaction (location_t, tree, int);
 extern bool c_tree_equal (tree, tree);
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 1122a88..d91bd72 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -12527,7 +12527,8 @@ c_find_omp_placeholder_r (tree *tp, int *, void *data)
Remove any elements from the list that are invalid.  */
 
 tree
-c_finish_omp_clauses (tree clauses, bool is_omp, bool declare_simd)
+c_finish_omp_clauses (tree clauses, bool 

Re: [PATCH][RFC][Offloading] Fix PR68463

2016-02-10 Thread Ilya Verbin
Hi!

On Tue, Jan 19, 2016 at 16:32:13 +0300, Ilya Verbin wrote:
> On Tue, Jan 19, 2016 at 10:36:28 +0100, Jakub Jelinek wrote:
> > On Tue, Jan 19, 2016 at 09:57:01AM +0100, Richard Biener wrote:
> > > On Mon, 18 Jan 2016, Ilya Verbin wrote:
> > > > On Fri, Jan 15, 2016 at 09:15:01 +0100, Richard Biener wrote:
> > > > > On Fri, 15 Jan 2016, Ilya Verbin wrote:
> > > > > > II) The __offload_func_table, __offload_funcs_end, 
> > > > > > __offload_var_table,
> > > > > > __offload_vars_end are now provided by the linker script, instead of
> > > > > > crtoffload{begin,end}.o, this allows to surround all offload 
> > > > > > objects, even
> > > > > > those that are not claimed by lto-plugin.
> > > > > > Unfortunately it works only with ld, but doen't work with gold, 
> > > > > > because
> > > > > > https://sourceware.org/bugzilla/show_bug.cgi?id=15373
> > > > > > Any thoughts how to enable this linker script for gold?
> > > > > 
> > > > > The easiest way would probably to add this handling to the default
> > > > > "linker script" in gold.  I don't see an easy way around requiring
> > > > > changes to gold here - maybe dumping the default linker script from
> > > > > bfd and injecting the rules with some scripting so you have a complete
> > > > > script.  Though likely gold won't grok that result.
> > > > > 
> > > > > Really a question for Ian though.
> > > > 
> > > > Or the gcc driver can add crtoffload{begin,end}.o, but the problem is 
> > > > that it
> > > > can't determine whether the program contains offloading or not.  So it 
> > > > can add
> > > > them to all -fopenmp/-fopenacc programs, if the compiler was configured 
> > > > with
> > > > --enable-offload-targets=...  The overhead would be about 340 bytes for
> > > > binaries which doesn't use offloading.  Is this acceptable?  (Jakub?)
> > > 
> > > Can lto-wrapper add them as plugin outputs?  Or does that wreck ordering?
> 
> Currently it's implemented this way, but it will not work after my patch,
> because e.g. offload-without-lto.o and offload-with-lto.o will be linked in
> this order:
> offload-without-lto.o, crtoffloadbegin.o, offload-with-lto.o, crtoffloadend.o
> ^
> (will be not claimed by the plugin)
> 
> But we need this one:
> crtoffloadbegin.o, offload-without-lto.o, offload-with-lto.o, crtoffloadend.o
> 
> > Yeah, if that would work, it would be certainly appreciated, one thing is
> > wasting .text space and relocations in all -fopenmp programs (for -fopenacc
> > programs one kind of assumes there will be some offloading in there),
> > another one some extra constructor/destructor or what that would be even
> > worse.
> 
> They contain only 5 symbols, without constructors/destructors.

This patch adds crtoffload{begin,end}.o to all -fopenmp programs, if they exist.
I couldn't think of a better solution...
Tested using the testcase from the previous mail, e.g.:

$ gcc -DNUM=1 -c -fopenmp test.c -o obj1.o
$ gcc -DNUM=2 -c -fopenmp test.c -o obj2.o
$ gcc -DNUM=3 -c -fopenmp test.c -o obj3.o
$ gcc -DNUM=4 -c -fopenmp test.c -o obj4.o -flto
$ gcc -DNUM=5 -c -fopenmp test.c -o obj5.o
$ gcc -DNUM=6 -c -fopenmp test.c -o obj6.o -flto
$ gcc -DNUM=7 -c -fopenmp test.c -o obj7.o
$ gcc-ar -cvq libtest.a obj3.o obj4.o obj5.o
$ gcc -fopenmp main.c obj1.o obj2.o libtest.a obj6.o obj7.o

And other combinations.


gcc/
PR driver/68463
* config/gnu-user.h (GNU_USER_TARGET_STARTFILE_SPEC): Add
crtoffloadbegin.o for -fopenacc/-fopenmp if it exists.
(GNU_USER_TARGET_ENDFILE_SPEC): Add crtoffloadend.o for
-fopenacc/-fopenmp if it exists.
* lto-wrapper.c (offloadbegin, offloadend): Remove static vars.
(offload_objects_file_name): New static var.
(tool_cleanup): Remove offload_objects_file_name file.
(copy_file): Remove function.
(find_offloadbeginend): Remove function.
(run_gcc): Remove offload_argc and offload_argv.
Get offload_objects_file_name from -foffload-objects=... option.
Read names of object files with offload from this file, pass them to
compile_images_for_offload_targets.  Don't call find_offloadbeginend and
don't pass offloadbegin and offloadend to the linker.  Don't pass
offload non-LTO files to the linker, because now they're not claimed.
lto-plugin/
PR driver/68463
* lto-plugin.c (struct plugin_offload_file): New.
(offload_files): Change

Re: [PING][PATCH] Mark symbols in offload tables with force_output in read_offload_tables

2016-02-08 Thread Ilya Verbin
On Mon, Feb 08, 2016 at 14:20:11 +0100, Tom de Vries wrote:
> On 26/01/16 14:01, Ilya Verbin wrote:
> >On Tue, Jan 26, 2016 at 13:21:57 +0100, Tom de Vries wrote:
> >>On 25/01/16 14:27, Ilya Verbin wrote:
> >>>On Tue, Jan 05, 2016 at 15:56:15 +0100, Tom de Vries wrote:
> >>>>>diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
> >>>>>index 62e5454..cdaee41 100644
> >>>>>--- a/gcc/lto-cgraph.c
> >>>>>+++ b/gcc/lto-cgraph.c
> >>>>>@@ -1911,6 +1911,11 @@ input_offload_tables (void)
> >>>>>   tree fn_decl
> >>>>> = lto_file_decl_data_get_fn_decl (file_data, 
> >>>>> decl_index);
> >>>>>   vec_safe_push (offload_funcs, fn_decl);
> >>>>>+
> >>>>>+  /* Prevent IPA from removing fn_decl as unreachable, 
> >>>>>since there
> >>>>>+ may be no refs from the parent function to child_fn in 
> >>>>>offload
> >>>>>+ LTO mode.  */
> >>>>>+  cgraph_node::get (fn_decl)->mark_force_output ();
> >>>>> }
> >>>>>   else if (tag == LTO_symtab_variable)
> >>>>> {
> >>>>>@@ -1918,6 +1923,10 @@ input_offload_tables (void)
> >>>>>   tree var_decl
> >>>>> = lto_file_decl_data_get_var_decl (file_data, 
> >>>>> decl_index);
> >>>>>   vec_safe_push (offload_vars, var_decl);
> >>>>>+
> >>>>>+  /* Prevent IPA from removing var_decl as unused, since 
> >>>>>there
> >>>>>+ may be no refs to var_decl in offload LTO mode.  */
> >>>>>+  varpool_node::get (var_decl)->force_output = 1;
> >>>>> }
> >>>
> >>>This doesn't work when there is more than one LTO partition, because only 
> >>>first
> >>>partition contains full offload table to maintain correct order, but 
> >>>cgraph and
> >>>varpool nodes aren't necessarily created for the first partition.  To 
> >>>reproduce:
> >>>
> >>>$ make check-target-libgomp RUNTESTFLAGS="c.exp=for-* 
> >>>--target_board=unix/-flto"
> >>>FAIL: libgomp.c/for-3.c (internal compiler error)
> >>>FAIL: libgomp.c/for-5.c (internal compiler error)
> >>>FAIL: libgomp.c/for-6.c (internal compiler error)
> >>>$ make check-target-libgomp RUNTESTFLAGS="c++.exp=for-* 
> >>>--target_board=unix/-flto"
> >>>FAIL: libgomp.c++/for-11.C (internal compiler error)
> >>>FAIL: libgomp.c++/for-13.C (internal compiler error)
> >>>FAIL: libgomp.c++/for-14.C (internal compiler error)
> >>
> >>This works for me.
> >>
> >>OK for trunk?
> >>
> >>Thanks,
> >>- Tom
> >>
> >
> >>Check that cgraph/varpool_node exists before use in input_offload_tables
> >>
> >>2016-01-26  Tom de Vries  <t...@codesourcery.com>
> >>
> >>* lto-cgraph.c (input_offload_tables): Check that cgraph/varpool_node
> >>exists before use.
> >
> >In this case they will be not marked as force_output in other partitions 
> >(except
> >the first one).
> 
> AFAIU, that's not the case.
> 
> If we're splitting up lto compilation over partitions, it means we're first
> calling lto1 in WPA mode. We'll read in all offload tables, and mark all
> symbols with force_output, and when writing out the partitions, we'll write
> the offload symbols out with force_output set.
> 
> This updated patch only does the force_output marking for offload symbols in
> WPA or LTO. It's not necessary in LTRANS mode.

You're right, works for me.

  -- Ilya


Re: [PATCH, PR69607] Mark offload symbols as global in lto

2016-02-08 Thread Ilya Verbin
On Mon, Feb 08, 2016 at 14:00:00 +0100, Tom de Vries wrote:
> when running libgomp.c testsuite with "-flto -flto-partition=1to1
> -fno-toplevel-reorder" we run into many compilation failures like this:
> ...
> /tmp/.ltrans0.ltrans.o:(.gnu.offload_funcs+0x1a0): undefined
> reference to `MAIN__._omp_fn.0'^M
> ...
> 
> The problem is that the offload table is in one lto partition, and the
> function listed in the offload table is in another, without the function
> having been promoted to be visible in the other partition.
> 
> The patch fixes this by promoting the symbols in the offload table such that
> they're visible in all partitions.
> 
> Bootstrapped and reg-tested on x86_64.
> 
> Build for nvidia accelerator and reg-tested libgomp with various lto
> settings.

Works fine with intelmic offloading.

  -- Ilya


Re: [PING][PATCH] Mark symbols in offload tables with force_output in read_offload_tables

2016-01-26 Thread Ilya Verbin
On Tue, Jan 26, 2016 at 13:21:57 +0100, Tom de Vries wrote:
> On 25/01/16 14:27, Ilya Verbin wrote:
> >On Tue, Jan 05, 2016 at 15:56:15 +0100, Tom de Vries wrote:
> >>>diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
> >>>index 62e5454..cdaee41 100644
> >>>--- a/gcc/lto-cgraph.c
> >>>+++ b/gcc/lto-cgraph.c
> >>>@@ -1911,6 +1911,11 @@ input_offload_tables (void)
> >>> tree fn_decl
> >>>   = lto_file_decl_data_get_fn_decl (file_data, decl_index);
> >>> vec_safe_push (offload_funcs, fn_decl);
> >>>+
> >>>+/* Prevent IPA from removing fn_decl as unreachable, since there
> >>>+   may be no refs from the parent function to child_fn in offload
> >>>+   LTO mode.  */
> >>>+cgraph_node::get (fn_decl)->mark_force_output ();
> >>>   }
> >>> else if (tag == LTO_symtab_variable)
> >>>   {
> >>>@@ -1918,6 +1923,10 @@ input_offload_tables (void)
> >>> tree var_decl
> >>>   = lto_file_decl_data_get_var_decl (file_data, decl_index);
> >>> vec_safe_push (offload_vars, var_decl);
> >>>+
> >>>+/* Prevent IPA from removing var_decl as unused, since there
> >>>+   may be no refs to var_decl in offload LTO mode.  */
> >>>+varpool_node::get (var_decl)->force_output = 1;
> >>>   }
> >
> >This doesn't work when there is more than one LTO partition, because only 
> >first
> >partition contains full offload table to maintain correct order, but cgraph 
> >and
> >varpool nodes aren't necessarily created for the first partition.  To 
> >reproduce:
> >
> >$ make check-target-libgomp RUNTESTFLAGS="c.exp=for-* 
> >--target_board=unix/-flto"
> >FAIL: libgomp.c/for-3.c (internal compiler error)
> >FAIL: libgomp.c/for-5.c (internal compiler error)
> >FAIL: libgomp.c/for-6.c (internal compiler error)
> >$ make check-target-libgomp RUNTESTFLAGS="c++.exp=for-* 
> >--target_board=unix/-flto"
> >FAIL: libgomp.c++/for-11.C (internal compiler error)
> >FAIL: libgomp.c++/for-13.C (internal compiler error)
> >FAIL: libgomp.c++/for-14.C (internal compiler error)
> 
> This works for me.
> 
> OK for trunk?
> 
> Thanks,
> - Tom
> 

> Check that cgraph/varpool_node exists before use in input_offload_tables
> 
> 2016-01-26  Tom de Vries  <t...@codesourcery.com>
> 
>   * lto-cgraph.c (input_offload_tables): Check that cgraph/varpool_node
>   exists before use.

In this case they will be not marked as force_output in other partitions (except
the first one).

  -- Ilya


Re: [PING][PATCH] Mark symbols in offload tables with force_output in read_offload_tables

2016-01-25 Thread Ilya Verbin
Hi!

On Tue, Jan 05, 2016 at 15:56:15 +0100, Tom de Vries wrote:
> >diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
> >index 62e5454..cdaee41 100644
> >--- a/gcc/lto-cgraph.c
> >+++ b/gcc/lto-cgraph.c
> >@@ -1911,6 +1911,11 @@ input_offload_tables (void)
> >   tree fn_decl
> > = lto_file_decl_data_get_fn_decl (file_data, decl_index);
> >   vec_safe_push (offload_funcs, fn_decl);
> >+
> >+  /* Prevent IPA from removing fn_decl as unreachable, since there
> >+ may be no refs from the parent function to child_fn in offload
> >+ LTO mode.  */
> >+  cgraph_node::get (fn_decl)->mark_force_output ();
> > }
> >   else if (tag == LTO_symtab_variable)
> > {
> >@@ -1918,6 +1923,10 @@ input_offload_tables (void)
> >   tree var_decl
> > = lto_file_decl_data_get_var_decl (file_data, decl_index);
> >   vec_safe_push (offload_vars, var_decl);
> >+
> >+  /* Prevent IPA from removing var_decl as unused, since there
> >+ may be no refs to var_decl in offload LTO mode.  */
> >+  varpool_node::get (var_decl)->force_output = 1;
> > }

This doesn't work when there is more than one LTO partition, because only first
partition contains full offload table to maintain correct order, but cgraph and
varpool nodes aren't necessarily created for the first partition.  To reproduce:

$ make check-target-libgomp RUNTESTFLAGS="c.exp=for-* --target_board=unix/-flto"
FAIL: libgomp.c/for-3.c (internal compiler error)
FAIL: libgomp.c/for-5.c (internal compiler error)
FAIL: libgomp.c/for-6.c (internal compiler error)
$ make check-target-libgomp RUNTESTFLAGS="c++.exp=for-* 
--target_board=unix/-flto"
FAIL: libgomp.c++/for-11.C (internal compiler error)
FAIL: libgomp.c++/for-13.C (internal compiler error)
FAIL: libgomp.c++/for-14.C (internal compiler error)

  -- Ilya


Re: [hsa merge 07/10] IPA-HSA pass

2016-01-20 Thread Ilya Verbin
On Fri, Jan 15, 2016 at 21:05:47 +0300, Ilya Verbin wrote:
> On Fri, Jan 15, 2016 at 17:45:22 +0100, Jakub Jelinek wrote:
> > On Fri, Jan 15, 2016 at 07:38:14PM +0300, Ilya Verbin wrote:
> > > On Fri, Jan 15, 2016 at 17:09:54 +0100, Jakub Jelinek wrote:
> > > > On Fri, Jan 15, 2016 at 05:02:34PM +0100, Martin Jambor wrote:
> > > > > How do other accelerators cope with the situation when half of the
> > > > > application is compiled with the accelerator disabled?  (Would some of
> > > > > their calls to GOMP_target_ext lead to abort?)
> > > > 
> > > > GOMP_target_ext should never abort (unless internal error), worst case 
> > > > it
> > > > just falls back into the host fallback.
> > > 
> > > Wouldn't that lead to hard-to-find problems in case of nonshared memory?
> > > I mean when someone expects that all target regions are executed on the 
> > > device,
> > > but in fact some of them are silently executed on the host with different 
> > > data
> > > environment.
> > 
> > E.g. for HSA it really shouldn't matter, as it is shared memory accelerator.
> > For XeonPhi we hopefully can offload anything.
> 
> As you said, if compilation of target image fails with ICE or somehow, host
> fallback and offloading to other targets should still work:
> https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00951.html
> That patch was not applied, but it can be simulated by -foffload=disable,

I agree that OpenMP doesn't guarantee that all target regions must be executed
on the device, but in this case a user can't be sure that some library function
always will offload (because the library might be replaced by fallback version),
and he/she will have to write something like:

{
  map_data_to_target ();
  some_library1_fn_with_offload ();
  get_data_from_target ();   /* ! */
  send_data_to_target ();/* ! */
  some_library2_fn_with_offload ();
  get_data_from_target ();   /* ! */
  send_data_to_target ();/* ! */
  some_library3_fn_with_offload ();
  unmap_data_from_target ();
}

If you're OK with this, I'll install this patch:


libgomp/
* target.c (gomp_get_target_fn_addr): Allow host fallback if target
function wasn't mapped to the device with non-shared memory.

diff --git a/libgomp/target.c b/libgomp/target.c
index f1f5849..96fe3d5 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1436,12 +1436,7 @@ gomp_get_target_fn_addr (struct gomp_device_descr 
*devicep,
   splay_tree_key tgt_fn = splay_tree_lookup (>mem_map, );
   gomp_mutex_unlock (>lock);
   if (tgt_fn == NULL)
-   {
- if (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
-   return NULL;
- else
-   gomp_fatal ("Target function wasn't mapped");
-   }
+   return NULL;
 
   return (void *) tgt_fn->tgt_offset;
 }

  -- Ilya


Re: [hsa merge 02/10] Modifications to libgomp proper

2016-01-20 Thread Ilya Verbin
On Wed, Jan 13, 2016 at 18:39:27 +0100, Martin Jambor wrote:
>   * task.c (GOMP_PLUGIN_target_task_completion): Free
>   firstprivate_copies.

Also this change caused 3 fails on intelmicemul:

FAIL: libgomp.c/target-32.c execution test
FAIL: libgomp.c/target-33.c execution test
FAIL: libgomp.c/target-34.c execution test

Because ttask->firstprivate_copies is uninitialized for 
!GOMP_OFFLOAD_CAP_SHARED_MEM.

(gdb) p ttask->firstprivate_copies
$1 = (void *) 0x1
(gdb) n
Program received signal SIGSEGV, Segmentation fault.
0x003b076800dc in free () from /lib64/libc.so.6
(gdb) bt
#0  0x003b076800dc in free () from /lib64/libc.so.6
#1  0x77dda871 in GOMP_PLUGIN_target_task_completion (data=0x624ac0) at 
gcc/libgomp/task.c:585
[...]


OK for trunk?

libgomp/
* task.c (gomp_create_target_task): Set firstprivate_copies to NULL.

diff --git a/libgomp/task.c b/libgomp/task.c
index 0f45c44..38d4e9b 100644
--- a/libgomp/task.c
+++ b/libgomp/task.c
@@ -683,6 +683,7 @@ gomp_create_target_task (struct gomp_device_descr *devicep,
   ttask->state = state;
   ttask->task = task;
   ttask->team = team;
+  ttask->firstprivate_copies = NULL;
   task->fn = NULL;
   task->fn_data = ttask;
   task->final_task = 0;

  -- Ilya


Re: [hsa merge 02/10] Modifications to libgomp proper

2016-01-20 Thread Ilya Verbin
On Wed, Jan 13, 2016 at 18:39:27 +0100, Martin Jambor wrote:
> diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp 
> b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
> index 68f7b2c..58ef595 100644
> --- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
> +++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
> @@ -528,7 +528,7 @@ GOMP_OFFLOAD_dev2dev (int device, void *dst_ptr, const 
> void *src_ptr,
>  
>  extern "C" void
>  GOMP_OFFLOAD_async_run (int device, void *tgt_fn, void *tgt_vars,
> - void *async_data)
> + void **, void *async_data)
>  {
>TRACE ("(device = %d, tgt_fn = %p, tgt_vars = %p, async_data = %p)", 
> device,
>tgt_fn, tgt_vars, async_data);
> @@ -544,7 +544,7 @@ GOMP_OFFLOAD_async_run (int device, void *tgt_fn, void 
> *tgt_vars,
>  }
>  
>  extern "C" void
> -GOMP_OFFLOAD_run (int device, void *tgt_fn, void *tgt_vars)
> +GOMP_OFFLOAD_run (int device, void *tgt_fn, void *tgt_vars, void **)
>  {
>TRACE ("(device = %d, tgt_fn = %p, tgt_vars = %p)", device, tgt_fn, 
> tgt_vars);

This breaks GOMP_OFFLOAD_run.  Committed as obvious.


2016-01-20  Ilya Verbin  <ilya.ver...@intel.com>

liboffloadmic/
* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_run): Pass extra NULL
to GOMP_OFFLOAD_async_run.


diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp 
b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index 58ef595..57accb4 100644
--- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -548,5 +548,5 @@ GOMP_OFFLOAD_run (int device, void *tgt_fn, void *tgt_vars, 
void **)
 {
   TRACE ("(device = %d, tgt_fn = %p, tgt_vars = %p)", device, tgt_fn, 
tgt_vars);
 
-  GOMP_OFFLOAD_async_run (device, tgt_fn, tgt_vars, NULL);
+  GOMP_OFFLOAD_async_run (device, tgt_fn, tgt_vars, NULL, NULL);
 }


  -- Ilya


Re: [PATCH][RFC][Offloading] Fix PR68463

2016-01-19 Thread Ilya Verbin
On Tue, Jan 19, 2016 at 10:36:28 +0100, Jakub Jelinek wrote:
> On Tue, Jan 19, 2016 at 09:57:01AM +0100, Richard Biener wrote:
> > On Mon, 18 Jan 2016, Ilya Verbin wrote:
> > > On Fri, Jan 15, 2016 at 09:15:01 +0100, Richard Biener wrote:
> > > > On Fri, 15 Jan 2016, Ilya Verbin wrote:
> > > > > II) The __offload_func_table, __offload_funcs_end, 
> > > > > __offload_var_table,
> > > > > __offload_vars_end are now provided by the linker script, instead of
> > > > > crtoffload{begin,end}.o, this allows to surround all offload objects, 
> > > > > even
> > > > > those that are not claimed by lto-plugin.
> > > > > Unfortunately it works only with ld, but doen't work with gold, 
> > > > > because
> > > > > https://sourceware.org/bugzilla/show_bug.cgi?id=15373
> > > > > Any thoughts how to enable this linker script for gold?
> > > > 
> > > > The easiest way would probably to add this handling to the default
> > > > "linker script" in gold.  I don't see an easy way around requiring
> > > > changes to gold here - maybe dumping the default linker script from
> > > > bfd and injecting the rules with some scripting so you have a complete
> > > > script.  Though likely gold won't grok that result.
> > > > 
> > > > Really a question for Ian though.
> > > 
> > > Or the gcc driver can add crtoffload{begin,end}.o, but the problem is 
> > > that it
> > > can't determine whether the program contains offloading or not.  So it 
> > > can add
> > > them to all -fopenmp/-fopenacc programs, if the compiler was configured 
> > > with
> > > --enable-offload-targets=...  The overhead would be about 340 bytes for
> > > binaries which doesn't use offloading.  Is this acceptable?  (Jakub?)
> > 
> > Can lto-wrapper add them as plugin outputs?  Or does that wreck ordering?

Currently it's implemented this way, but it will not work after my patch,
because e.g. offload-without-lto.o and offload-with-lto.o will be linked in
this order:
offload-without-lto.o, crtoffloadbegin.o, offload-with-lto.o, crtoffloadend.o
^
(will be not claimed by the plugin)

But we need this one:
crtoffloadbegin.o, offload-without-lto.o, offload-with-lto.o, crtoffloadend.o

> Yeah, if that would work, it would be certainly appreciated, one thing is
> wasting .text space and relocations in all -fopenmp programs (for -fopenacc
> programs one kind of assumes there will be some offloading in there),
> another one some extra constructor/destructor or what that would be even
> worse.

They contain only 5 symbols, without constructors/destructors.

  -- Ilya


Re: [PATCH][RFC][Offloading] Fix PR68463

2016-01-18 Thread Ilya Verbin
On Fri, Jan 15, 2016 at 09:15:01 +0100, Richard Biener wrote:
> On Fri, 15 Jan 2016, Ilya Verbin wrote:
> > II) The __offload_func_table, __offload_funcs_end, __offload_var_table,
> > __offload_vars_end are now provided by the linker script, instead of
> > crtoffload{begin,end}.o, this allows to surround all offload objects, even
> > those that are not claimed by lto-plugin.
> > Unfortunately it works only with ld, but doen't work with gold, because
> > https://sourceware.org/bugzilla/show_bug.cgi?id=15373
> > Any thoughts how to enable this linker script for gold?
> 
> The easiest way would probably to add this handling to the default
> "linker script" in gold.  I don't see an easy way around requiring
> changes to gold here - maybe dumping the default linker script from
> bfd and injecting the rules with some scripting so you have a complete
> script.  Though likely gold won't grok that result.
> 
> Really a question for Ian though.

Or the gcc driver can add crtoffload{begin,end}.o, but the problem is that it
can't determine whether the program contains offloading or not.  So it can add
them to all -fopenmp/-fopenacc programs, if the compiler was configured with
--enable-offload-targets=...  The overhead would be about 340 bytes for
binaries which doesn't use offloading.  Is this acceptable?  (Jakub?)


> > I used the following testcase:
> > $ cat main.c
> > void foo1 ();
> > void foo2 ();
> > void foo3 ();
> > void foo4 ();
> > 
> > int main ()
> > {
> >   foo1 ();
> >   foo2 ();
> >   foo3 ();
> >   foo4 ();
> >   return 0;
> > }
> > 
> > $ cat test.c
> > #include 
> > #include 
> > #define MAKE_FN_NAME(x) foo ## x
> > #define FN_NAME(x) MAKE_FN_NAME(x)
> > void FN_NAME(NUM) ()
> > {
> >   int x, d;
> >   #pragma omp target map(from: x, d)
> > {
> >   x = NUM;
> >   d = omp_is_initial_device ();
> > }
> >   printf ("%s:\t%s ()\tx = %d\n", d ? "HOST" : "TARGET", __FUNCTION__, x);
> >   if (x != NUM)
> > printf ("^\n");
> > }
> > 
> > $ gcc -DNUM=1 -c -flto test.c -o obj1.o
> > $ gcc -DNUM=2 -c -fopenmp test.c -o obj2.o
> > $ gcc -DNUM=3 -c test.c -o obj3.o
> > $ gcc -DNUM=4 -c -flto -fopenmp test.c -o obj4.o
> > $ gcc -c main.c -o main.o
> > $ gcc -fopenmp obj1.o obj2.o obj3.o obj4.o main.o && ./a.out
> > $ gcc -fopenmp obj2.o obj3.o obj4.o obj1.o main.o && ./a.out
> > $ gcc -fopenmp obj3.o obj1.o obj2.o obj4.o main.o && ./a.out
> 
> Did you try linking an archive with both offload-but-no-LTO and
> offload-and-LTO objects inside?

No.  And it didn't work, because archives are handled by ld a bit differently.
I will fix it.  Thanks!  From ld/ldlang.c:

/* Find the insert point for the plugin's replacement files.  We
   place them after the first claimed real object file, or if the
   first claimed object is an archive member, after the last real
   object file immediately preceding the archive.

  -- Ilya


Re: [hsa merge 07/10] IPA-HSA pass

2016-01-15 Thread Ilya Verbin
On Fri, Jan 15, 2016 at 17:09:54 +0100, Jakub Jelinek wrote:
> On Fri, Jan 15, 2016 at 05:02:34PM +0100, Martin Jambor wrote:
> > How do other accelerators cope with the situation when half of the
> > application is compiled with the accelerator disabled?  (Would some of
> > their calls to GOMP_target_ext lead to abort?)
> 
> GOMP_target_ext should never abort (unless internal error), worst case it
> just falls back into the host fallback.

Wouldn't that lead to hard-to-find problems in case of nonshared memory?
I mean when someone expects that all target regions are executed on the device,
but in fact some of them are silently executed on the host with different data
environment.

  -- Ilya


Re: [hsa merge 07/10] IPA-HSA pass

2016-01-15 Thread Ilya Verbin
On Fri, Jan 15, 2016 at 17:45:22 +0100, Jakub Jelinek wrote:
> On Fri, Jan 15, 2016 at 07:38:14PM +0300, Ilya Verbin wrote:
> > On Fri, Jan 15, 2016 at 17:09:54 +0100, Jakub Jelinek wrote:
> > > On Fri, Jan 15, 2016 at 05:02:34PM +0100, Martin Jambor wrote:
> > > > How do other accelerators cope with the situation when half of the
> > > > application is compiled with the accelerator disabled?  (Would some of
> > > > their calls to GOMP_target_ext lead to abort?)
> > > 
> > > GOMP_target_ext should never abort (unless internal error), worst case it
> > > just falls back into the host fallback.
> > 
> > Wouldn't that lead to hard-to-find problems in case of nonshared memory?
> > I mean when someone expects that all target regions are executed on the 
> > device,
> > but in fact some of them are silently executed on the host with different 
> > data
> > environment.
> 
> E.g. for HSA it really shouldn't matter, as it is shared memory accelerator.
> For XeonPhi we hopefully can offload anything.

As you said, if compilation of target image fails with ICE or somehow, host
fallback and offloading to other targets should still work:
https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00951.html
That patch was not applied, but it can be simulated by -foffload=disable,
I've created a testcase:

$ cat main.c

#pragma omp declare target
int x;
#pragma omp end declare target
extern int foo ();

int main ()
{
  int shared_mem = 0;
  #pragma omp target map (alloc: x, shared_mem)
{
  x = 10;
  shared_mem = 1;
}

  x = 20;
  int r = foo ();
  if (!shared_mem && r != 100)
__builtin_abort ();
  return 0;
}


$ cat liba.c 

#pragma omp declare target
extern int x;
#pragma omp end declare target

int foo ()
{
  int r;
  #pragma omp target map (from: r) map (alloc: x)
r = x * x;
  return r;
}


$ gcc -fopenmp -fPIC -shared liba.c -o liba.so -foffload=disable
$ gcc -fopenmp -L. -la main.c


Currently it prints "libgomp: Target function wasn't mapped", but after this
change:

--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1390,7 +1390,7 @@ gomp_get_target_fn_addr (struct gomp_device_descr 
*devicep,
   splay_tree_key tgt_fn = splay_tree_lookup (>mem_map, );
   gomp_mutex_unlock (>lock);
   if (tgt_fn == NULL)
-   gomp_fatal ("Target function wasn't mapped");
+   return NULL;

... it will fail at __builtin_abort, but without -foffload=disable it will pass.

  -- Ilya


[PATCH][RFC][Offloading] Fix PR68463

2016-01-14 Thread Ilya Verbin
Hi!

Here is my attempt to fix https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68463

This patch does 2 things:

I) lto-plugin doesn't claim files which contain offload sections, but don't
contain LTO sections.  Instead, it writes names of files with offloading to the
temporary file and passes it to lto-wrapper as -foffload-objects=/tmp/cc...
The order of these files in the list is very important, because ld will link
host objects (and therefore host tables) in the following order:
  1. Non-LTO files before the first claimed LTO file;
  2. LTO files, after WPA-partitioning-recompilation;
  3. Non-LTO files after the first claimed LTO file.
To get the correct matching between host and target tables, the offload objects
need to be reordered correspondingly before passing to the target compiler.

II) The __offload_func_table, __offload_funcs_end, __offload_var_table,
__offload_vars_end are now provided by the linker script, instead of
crtoffload{begin,end}.o, this allows to surround all offload objects, even
those that are not claimed by lto-plugin.
Unfortunately it works only with ld, but doen't work with gold, because
https://sourceware.org/bugzilla/show_bug.cgi?id=15373
Any thoughts how to enable this linker script for gold?


I used the following testcase:
$ cat main.c
void foo1 ();
void foo2 ();
void foo3 ();
void foo4 ();

int main ()
{
  foo1 ();
  foo2 ();
  foo3 ();
  foo4 ();
  return 0;
}

$ cat test.c
#include 
#include 
#define MAKE_FN_NAME(x) foo ## x
#define FN_NAME(x) MAKE_FN_NAME(x)
void FN_NAME(NUM) ()
{
  int x, d;
  #pragma omp target map(from: x, d)
{
  x = NUM;
  d = omp_is_initial_device ();
}
  printf ("%s:\t%s ()\tx = %d\n", d ? "HOST" : "TARGET", __FUNCTION__, x);
  if (x != NUM)
printf ("^\n");
}

$ gcc -DNUM=1 -c -flto test.c -o obj1.o
$ gcc -DNUM=2 -c -fopenmp test.c -o obj2.o
$ gcc -DNUM=3 -c test.c -o obj3.o
$ gcc -DNUM=4 -c -flto -fopenmp test.c -o obj4.o
$ gcc -c main.c -o main.o
$ gcc -fopenmp obj1.o obj2.o obj3.o obj4.o main.o && ./a.out
$ gcc -fopenmp obj2.o obj3.o obj4.o obj1.o main.o && ./a.out
$ gcc -fopenmp obj3.o obj1.o obj2.o obj4.o main.o && ./a.out


gcc/
PR driver/68463
* config/i386/intelmic-mkoffload.c (generate_target_descr_file): Don't
define __offload_func_table and __offload_var_table.
(generate_target_offloadend_file): Remove function.
(prepare_target_image): Don't call generate_target_offloadend_file.
* lto-wrapper.c (offloadbegin, offloadend): Remove static vars.
(offload_objects_file_name): New static var.
(tool_cleanup): Remove offload_objects_file_name file.
(find_offloadbeginend): Rename to ...
(find_crtoffload): ... this.  Locate crtoffload.o instead of
crtoffloadbegin.o and crtoffloadend.o.
(run_gcc): Remove offload_argc and offload_argv.
Get offload_objects_file_name from -foffload-objects=... option.
Read names of object files with offload from this file, pass them to
compile_images_for_offload_targets.  Call find_crtoffload instead of
find_offloadbeginend.  Don't give offload files to the linker when LTO
is disabled, because now they're not claimed, therefore not discarded.
libgcc/
PR driver/68463
* Makefile.in (crtoffloadbegin$(objext)): Remove rule.
(crtoffloadend$(objext)): Likewise.
(crtoffload$(objext), link-offload-tables.x): New rules.
* configure: Regenerate.
* configure.ac (extra_parts): Add link-offload-tables.x if offloading is
enabled, or if this is an accel compiler for intelmic.
* link-offload-tables.x: New file.
* offloadstuff.c: Do not define __offload_func_table,
__offload_var_table, __offload_funcs_end, __offload_vars_end.
libgomp/
PR driver/68463
* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac (link_offload_tables): New output variable.  Set to
"%Tlink-offload-tables.x" if offloading is enabled, or if this is an
accel compiler for intelmic.
* libgomp.spec.in (*link_gomp): Add @link_offload_tables@.
* testsuite/Makefile.in: Regenerate.
lto-plugin/
PR driver/68463
* lto-plugin.c (offload_files): Replace with ...
(offload_files_1, offload_files_2, offload_files_3): ... this.
(num_offload_files): Replace with ...
(num_offload_files_1, num_offload_files_2, num_offload_files_3): ..this.
(free_2): Adjust accordingly.
(all_symbols_read_handler): Don't add offload files to lto_arg_ptr.
Don't call free_1 for offload_files.  Write names of object files with
offloading to the temporary file.  Add new option to lto_arg_ptr.
(claim_file_handler): Don't claim file if it contains offload sections
without LTO sections, add it to offload_files_1 or to offload_files_3.
Add files with offload and LTO sections to offload_files_2.


diff 

Re: [gomp4.5] Handle #pragma omp declare target link

2015-12-14 Thread Ilya Verbin
On Mon, Nov 30, 2015 at 21:49:02 +0100, Jakub Jelinek wrote:
> On Mon, Nov 30, 2015 at 11:29:34PM +0300, Ilya Verbin wrote:
> > > This looks wrong, both of these clearly could affect anything with
> > > DECL_HAS_VALUE_EXPR_P, not just the link vars.
> > > So, if you need to handle the "omp declare target link" vars specially,
> > > you should only handle those specially and nothing else.  And please try 
> > > to
> > > explain why.
> > 
> > Actually these ifndefs are not needed, because assemble_decl never will be
> > called by accel compiler for original link vars.  I've added a check into
> > output_in_order, but missed a second place where assemble_decl is called -
> > symbol_table::output_variables.  So, fixed now.
> 
> Great.
> 
> > > Do we need to do anything in gomp_unload_image_from_device ?
> > > I mean at least in questionable programs that for link vars don't 
> > > decrement
> > > the refcount of the var that replaced the link var to 0 first before
> > > dlclosing the library.
> > > At least host_var_table[j * 2 + 1] will have the MSB set, so we need to
> > > handle it differently.  Perhaps for that case perform a lookup, and if we
> > > get something which has link_map non-NULL, first perform as if there is
> > > target exit data delete (var) on it first?
> > 
> > You're right, it doesn't deallocate memory on the device if DSO leaves 
> > nonzero
> > refcount.  And currently host compiler doesn't set MSB in host_var_table, 
> > it's
> > set only by accel compiler.  But it's possible to do splay_tree_lookup for 
> > each
> > var to determine whether is it linked or not, like in the patch bellow.
> > Or do you prefer to set the bit in host compiler too?  It requires
> > lookup_attribute ("omp declare target link") for all vars in the table 
> > during
> > compilation, but allows to do splay_tree_lookup at run-time only for vars 
> > with
> > MSB set in host_var_table.
> > Unfortunately, calling gomp_exit_data from gomp_unload_image_from_device 
> > works
> > only for DSO, but it crashed when an executable leaves nonzero refcount, 
> > because
> > target device may be already uninitialized from plugin's __run_exit_handlers
> > (and it is in case of intelmic), so gomp_exit_data cannot run free_func.
> > Is it possible do add some atexit (...) to libgomp, which will set 
> > shutting_down
> > flag, and just do nothing in gomp_unload_image_from_device if it is set?
> 
> Sorry, I didn't mean you should call gomp_exit_data, what I meant was that
> you perform the same action as would delete(var) do in that case.
> Calling gomp_exit_data e.g. looks it up again etc.
> Supposedly having the MSB in host table too is useful, so if you could
> handle that, it would be nice.  And splay_tree_lookup only if the MSB is
> set.
> So,
> if (!host_data_has_msb_set)
>   splay_tree_remove (>mem_map, );
> else
>   {
> splay_tree_key n = splay_tree_lookup (>mem_map, );
> if (n->link_key)
> {
>   n->refcount = 0;
>   n->link_key = NULL;
>   splay_tree_remove (>mem_map, n);
>   if (n->tgt->refcount > 1)
> n->tgt->refcount--;
>   else
> gomp_unmap_tgt (n->tgt);
> }
>   else
> splay_tree_remove (>mem_map, n);
>   }
> or so.

Here is an updated patch.  Now MSB is set in both tables, and
gomp_unload_image_from_device is changed.  I've verified using simple DSO
testcase, that memory on target is freed after dlclose.
bootstrap and make check on x86_64-linux passed.


gcc/c-family/
* c-common.c (c_common_attribute_table): Handle "omp declare target
link" attribute.
gcc/
* cgraphunit.c (output_in_order): Do not assemble "omp declare target
link" variables in ACCEL_COMPILER.
* gimplify.c (gimplify_adjust_omp_clauses): Do not remove mapping of
"omp declare target link" variables.
* lto/lto.c: Include stringpool.h and fold-const.h.
(offload_handle_link_vars): New static function.
(lto_main): Call offload_handle_link_vars.
* omp-low.c (scan_sharing_clauses): Do not remove mapping of "omp
declare target link" variables.
(add_decls_addresses_to_decl_constructor): For "omp declare target link"
variables output address of the artificial pointer instead of address of
the variable.  Set most significant bit of the size to mark them.
(pass_data_omp_target_link): New pass_data.
(pass_omp_targe

Re: [gomp4.5] Handle #pragma omp declare target link

2015-12-14 Thread Ilya Verbin
On Fri, Dec 11, 2015 at 18:27:13 +0100, Jakub Jelinek wrote:
> On Tue, Dec 08, 2015 at 05:45:59PM +0300, Ilya Verbin wrote:
> > @@ -356,6 +361,11 @@ gomp_map_vars (struct gomp_device_descr *devicep, 
> > size_t mapnum,
> >  }
> >  
> >gomp_mutex_lock (>lock);
> > +  if (devicep->state == GOMP_DEVICE_FINALIZED)
> > +{
> > +  gomp_mutex_unlock (>lock);
> 
> You need to free (tgt); here I think to avoid leaking memory.

Done.

> > +  return NULL;
> > +}
> >  
> >for (i = 0; i < mapnum; i++)
> >  {
> > @@ -834,6 +844,11 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool 
> > do_copyfrom)
> >  }
> >  
> >gomp_mutex_lock (>lock);
> > +  if (devicep->state == GOMP_DEVICE_FINALIZED)
> > +{
> > +  gomp_mutex_unlock (>lock);
> > +  return;
> 
> Supposedly you want at least free (tgt->array); free (tgt); here.

Done.

> Plus the question is if the mappings shouldn't be removed from the splay tree
> before that.

This code can be executed only at program shutdown, so I think that removing
from the splay tree isn't necessary here, it will only consume time.
Besides, we do not remove at shutdown those vars, which have non-zero refcount.

> > +/* This function finalizes all initialized devices.  */
> > +
> > +static void
> > +gomp_target_fini (void)
> > +{
> > +  int i;
> > +  for (i = 0; i < num_devices; i++)
> > +{
> > +  struct gomp_device_descr *devicep = [i];
> > +  gomp_mutex_lock (>lock);
> > +  if (devicep->state == GOMP_DEVICE_INITIALIZED)
> > +   {
> > + devicep->fini_device_func (devicep->target_id);
> > + devicep->state = GOMP_DEVICE_FINALIZED;
> > +   }
> > +  gomp_mutex_unlock (>lock);
> > +}
> > +}
> 
> The question is what will this do if there are async target tasks still
> running on some of the devices at this point (forgotten #pragma omp taskwait
> or similar if target nowait regions are started outside of parallel region,
> or exit inside of parallel, etc.  But perhaps it can be handled incrementally.
> Also there is the question that the 
> So I think the patch is ok with the above mentioned changes.

Here is what I've committed to trunk.


libgomp/
* libgomp.h (gomp_device_state): New enum.
(struct gomp_device_descr): Replace is_initialized with state.
(gomp_fini_device): Remove declaration.
* oacc-host.c (host_dispatch): Use state instead of is_initialized.
* oacc-init.c (acc_init_1): Use state instead of is_initialized.
(acc_shutdown_1): Likewise.  Inline gomp_fini_device.
(acc_set_device_type): Use state instead of is_initialized.
(acc_set_device_num): Likewise.
* target.c (resolve_device): Use state instead of is_initialized.
Do not initialize finalized device.
(gomp_map_vars): Do nothing if device is finalized.
(gomp_unmap_vars): Likewise.
(gomp_update): Likewise.
(GOMP_offload_register_ver): Use state instead of is_initialized.
(GOMP_offload_unregister_ver): Likewise.
(gomp_init_device): Likewise.
(gomp_unload_device): Likewise.
(gomp_fini_device): Remove.
(gomp_get_target_fn_addr): Do nothing if device is finalized.
(GOMP_target): Go to host fallback if device is finalized.
(GOMP_target_ext): Likewise.
(gomp_exit_data): Do nothing if device is finalized.
(gomp_target_task_fn): Go to host fallback if device is finalized.
(gomp_target_fini): New static function.
(gomp_target_init): Use state instead of is_initialized.
Call gomp_target_fini at exit.
liboffloadmic/
* plugin/libgomp-plugin-intelmic.cpp (unregister_main_image): Remove.
(register_main_image): Do not call unregister_main_image at exit.
(GOMP_OFFLOAD_fini_device): Allow for OpenMP.  Unregister main image.


diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index c467f97..9d9949f 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -888,6 +888,14 @@ typedef struct acc_dispatch_t
   } cuda;
 } acc_dispatch_t;
 
+/* Various state of the accelerator device.  */
+enum gomp_device_state
+{
+  GOMP_DEVICE_UNINITIALIZED,
+  GOMP_DEVICE_INITIALIZED,
+  GOMP_DEVICE_FINALIZED
+};
+
 /* This structure describes accelerator device.
It contains name of the corresponding libgomp plugin, function handlers for
interaction with the device, ID-number of the device, and information about
@@ -933,8 +941,10 @@ struct gomp_device_descr
   /* Mutex for the mutable data.  */
   gomp_mutex_t lock;
 
-  /* Set to true when device is initialized.  */
-  bool is_initialized;
+  /* Current state of the 

Re: [gomp4.5] Handle #pragma omp declare target link

2015-12-11 Thread Ilya Verbin
On Fri, Dec 11, 2015 at 18:27:13 +0100, Jakub Jelinek wrote:
> On Tue, Dec 08, 2015 at 05:45:59PM +0300, Ilya Verbin wrote:
> > --- a/libgomp/oacc-init.c
> > +++ b/libgomp/oacc-init.c
> > @@ -306,10 +306,11 @@ acc_shutdown_1 (acc_device_t d)
> >  {
> >struct gomp_device_descr *acc_dev = _dev[i];
> >gomp_mutex_lock (_dev->lock);
> > -  if (acc_dev->is_initialized)
> > +  if (acc_dev->state == GOMP_DEVICE_INITIALIZED)
> >  {
> >   devices_active = true;
> > - gomp_fini_device (acc_dev);
> > + acc_dev->fini_device_func (acc_dev->target_id);
> > + acc_dev->state = GOMP_DEVICE_UNINITIALIZED;
> > }
> >gomp_mutex_unlock (_dev->lock);
> >  }
> 
> I'd bet you want to set state here to GOMP_DEVICE_FINALIZED too,
> but I'd leave that to the OpenACC folks to do that incrementally
> once they test it and/or decide what to do.

libgomp/testsuite/libgomp.oacc-c-c++-common/lib-5.c contains a call to acc_init,
next acc_shutdown, and acc_init again, so I guess that OpenACC allows to
initialize the device again after acc_shutdown, but GOMP_DEVICE_FINALIZED means
that it's terminally finalized.

> > @@ -356,6 +361,11 @@ gomp_map_vars (struct gomp_device_descr *devicep, 
> > size_t mapnum,
> >  }
> >  
> >gomp_mutex_lock (>lock);
> > +  if (devicep->state == GOMP_DEVICE_FINALIZED)
> > +{
> > +  gomp_mutex_unlock (>lock);
> 
> You need to free (tgt); here I think to avoid leaking memory.
> 
> > +  return NULL;
> > +}
> >  
> >for (i = 0; i < mapnum; i++)
> >  {
> > @@ -834,6 +844,11 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool 
> > do_copyfrom)
> >  }
> >  
> >gomp_mutex_lock (>lock);
> > +  if (devicep->state == GOMP_DEVICE_FINALIZED)
> > +{
> > +  gomp_mutex_unlock (>lock);
> > +  return;
> 
> Supposedly you want at least free (tgt->array); free (tgt); here.
> Plus the question is if the mappings shouldn't be removed from the splay tree
> before that.
> 
> > +/* This function finalizes all initialized devices.  */
> > +
> > +static void
> > +gomp_target_fini (void)
> > +{
> > +  int i;
> > +  for (i = 0; i < num_devices; i++)
> > +{
> > +  struct gomp_device_descr *devicep = [i];
> > +  gomp_mutex_lock (>lock);
> > +  if (devicep->state == GOMP_DEVICE_INITIALIZED)
> > +   {
> > + devicep->fini_device_func (devicep->target_id);
> > + devicep->state = GOMP_DEVICE_FINALIZED;
> > +   }
> > +  gomp_mutex_unlock (>lock);
> > +}
> > +}
> 
> The question is what will this do if there are async target tasks still
> running on some of the devices at this point (forgotten #pragma omp taskwait
> or similar if target nowait regions are started outside of parallel region,
> or exit inside of parallel, etc.  But perhaps it can be handled incrementally.
> Also there is the question that the 
> So I think the patch is ok with the above mentioned changes.
> 
> What is the state of the link clause implementation patch?  Does it depend
> on this?

It's ready, but it depends on this.  I will retest and resend "link" patch after
checking-in "init/fini" patch.

  -- Ilya


Re: [6/6] OpenMP 4.0 library testsuite

2015-12-08 Thread Ilya Verbin
Hi!

On Tue, Oct 08, 2013 at 21:54:47 +0200, Jakub Jelinek wrote:
>   * testsuite/libgomp.c++/udr-1.C: New test.
>   * testsuite/libgomp.c++/udr-3.C: New test.
>   * testsuite/libgomp.c++/udr-9.C: New test.

I've just noticed that these tests fail fith -flto (on latest trunk and on Oct 
2013).

FAIL: libgomp.c++/udr-1.C (internal compiler error)
FAIL: libgomp.c++/udr-3.C (internal compiler error)
FAIL: libgomp.c++/udr-9.C (internal compiler error)
FAIL: libgomp.c++/udr-11.C (internal compiler error)
FAIL: libgomp.c++/udr-13.C (internal compiler error)
FAIL: libgomp.c++/udr-19.C (internal compiler error)

libgomp/testsuite/libgomp.c++/udr-1.C:56:13: internal compiler error: in 
discriminator_for_local_entity, at cp/mangle.c:1762
0x972ace discriminator_for_local_entity
gcc/cp/mangle.c:1762
0x972e49 write_local_name
gcc/cp/mangle.c:1850
0x96d079 write_name
gcc/cp/mangle.c:882
0x96c8c4 write_encoding
gcc/cp/mangle.c:744
0x96c365 write_mangled_name
gcc/cp/mangle.c:709
0x97c152 mangle_decl_string
gcc/cp/mangle.c:3509
0x97c198 get_mangled_id
gcc/cp/mangle.c:3531
0x97c622 mangle_decl(tree_node*)
gcc/cp/mangle.c:3598
0x12e3d77 decl_assembler_name(tree_node*)
gcc/tree.c:670
0x12f778d assign_assembler_name_if_neeeded(tree_node*)
gcc/tree.c:5879
0x12f78e5 free_lang_data_in_cgraph
gcc/tree.c:5934
0x12f7a99 free_lang_data
gcc/tree.c:5976
0x12f7b38 execute
gcc/tree.c:6025
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

  -- Ilya


Re: [gomp4.5] Handle #pragma omp declare target link

2015-12-08 Thread Ilya Verbin
On Tue, Dec 01, 2015 at 20:05:04 +0100, Jakub Jelinek wrote:
> This is racy, tsan would tell you so.
> Instead of a global var, I'd just change the devicep->is_initialized 
> field from bool into a 3 state field (perhaps enum), with states
> uninitialized, initialized, finalized, and then say in resolve_device,
> 
>   gomp_mutex_lock ([device_id].lock);
>   if (devices[device_id].state == GOMP_DEVICE_UNINITIALIZED)
> gomp_init_device ([device_id]);
>   else if (devices[device_id].state == GOMP_DEVICE_FINALIZED)
> {
>   gomp_mutex_unlock ([device_id].lock);
>   return NULL;
> }
>   gomp_mutex_unlock ([device_id].lock);
> 
> Though, of course, that is incomplete, because resolve_device takes one
> lock, gomp_get_target_fn_addr another one, gomp_map_vars yet another one.
> So I think either we want to rewrite the locking, such that say
> resolve_device returns a locked device and then you perform stuff on the
> locked device (disadvantage is that gomp_map_vars will call gomp_malloc
> with the lock held, which can take some time to allocate the memory),
> or there needs to be the possibility that gomp_map_vars rechecks if the
> device has not been finalized after taking the lock and returns to the
> caller if the device has been finalized in between resolve_device and
> gomp_map_vars.

This patch implements the second approach.  Is it OK?
Bootstrap and make check-target-libgomp passed.


libgomp/
* libgomp.h (gomp_device_state): New enum.
(struct gomp_device_descr): Replace is_initialized with state.
(gomp_fini_device): Remove declaration.
* oacc-host.c (host_dispatch): Use state instead of is_initialized.
* oacc-init.c (acc_init_1): Use state instead of is_initialized.
(acc_shutdown_1): Likewise.  Inline gomp_fini_device.
(acc_set_device_type): Use state instead of is_initialized.
(acc_set_device_num): Likewise.
* target.c (resolve_device): Use state instead of is_initialized.
Do not initialize finalized device.
(gomp_map_vars): Do nothing if device is finalized.
(gomp_unmap_vars): Likewise.
(gomp_update): Likewise.
(GOMP_offload_register_ver): Use state instead of is_initialized.
(GOMP_offload_unregister_ver): Likewise.
(gomp_init_device): Likewise.
(gomp_unload_device): Likewise.
(gomp_fini_device): Remove.
(gomp_get_target_fn_addr): Do nothing if device is finalized.
(GOMP_target): Go to host fallback if device is finalized.
(GOMP_target_ext): Likewise.
(gomp_exit_data): Do nothing if device is finalized.
(gomp_target_task_fn): Go to host fallback if device is finalized.
(gomp_target_fini): New static function.
(gomp_target_init): Use state instead of is_initialized.
Call gomp_target_fini at exit.
liboffloadmic/
* plugin/libgomp-plugin-intelmic.cpp (unregister_main_image): Remove.
(register_main_image): Do not call unregister_main_image at exit.
(GOMP_OFFLOAD_fini_device): Allow for OpenMP.  Unregister main image.


diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index c467f97..9d9949f 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -888,6 +888,14 @@ typedef struct acc_dispatch_t
   } cuda;
 } acc_dispatch_t;
 
+/* Various state of the accelerator device.  */
+enum gomp_device_state
+{
+  GOMP_DEVICE_UNINITIALIZED,
+  GOMP_DEVICE_INITIALIZED,
+  GOMP_DEVICE_FINALIZED
+};
+
 /* This structure describes accelerator device.
It contains name of the corresponding libgomp plugin, function handlers for
interaction with the device, ID-number of the device, and information about
@@ -933,8 +941,10 @@ struct gomp_device_descr
   /* Mutex for the mutable data.  */
   gomp_mutex_t lock;
 
-  /* Set to true when device is initialized.  */
-  bool is_initialized;
+  /* Current state of the device.  OpenACC allows to move from INITIALIZED 
state
+ back to UNINITIALIZED state.  OpenMP allows only to move from INITIALIZED
+ to FINALIZED state (at program shutdown).  */
+  enum gomp_device_state state;
 
   /* OpenACC-specific data and functions.  */
   /* This is mutable because of its mutable data_environ and target_data
@@ -962,7 +972,6 @@ extern void gomp_copy_from_async (struct target_mem_desc *);
 extern void gomp_unmap_vars (struct target_mem_desc *, bool);
 extern void gomp_init_device (struct gomp_device_descr *);
 extern void gomp_free_memmap (struct splay_tree_s *);
-extern void gomp_fini_device (struct gomp_device_descr *);
 extern void gomp_unload_device (struct gomp_device_descr *);
 
 /* work.c */
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index 9874804..d289b38 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -222,7 +222,7 @@ static struct gomp_device_descr host_dispatch =
 
 .mem_map = { NULL },
 /* .lock initilized in goacc_host_init.  */
-.is_initialized = false,
+.state = 

Re: [gomp4.5] Handle #pragma omp declare target link

2015-12-01 Thread Ilya Verbin
On Tue, Dec 01, 2015 at 14:15:59 +0100, Jakub Jelinek wrote:
> On Tue, Dec 01, 2015 at 11:48:51AM +0300, Ilya Verbin wrote:
> > > On 01 Dec 2015, at 11:18, Jakub Jelinek <ja...@redhat.com> wrote:
> > >> On Mon, Nov 30, 2015 at 11:55:20PM +0300, Ilya Verbin wrote:
> > >> Ok, but it doesn't solve the issue with doing it for the executable, 
> > >> because
> > >> gomp_unmap_tgt (n->tgt) will want to run free_func on uninitialized 
> > >> device.
> > > 
> > > ?? You mean that the
> > > devicep->unload_image_func (devicep->target_id, version, target_data);
> > > call deinitializes the device or something else (I mean, if there is some
> > > other tgt, then it had to be initialized)?
> > 
> > No, I mean that it can be deinitialized from plugin's __run_exit_handlers 
> > (see my last mail with the patch).
> 
> Then the bug is that you have too many atexit registered handlers that
> perform some finalization, better would be to have a single one that
> performs everything in order.
> 
> Anyway, the other option is in the atexit handlers (liboffloadmic and/or the
> intelmic plugin) to set some flag and ignore free_func calls when the flag
> is set or something like that.
> 
> Note library destructors can also use OpenMP code in them, similarly C++
> dtors etc., so when you at some point finalize certain device, you should
> arrange for newer events on the device to be ignored and new offloadings to
> go to host fallback.

So I guess the decision to do host fallback should be made in resolve_device,
rather than in plugins (in free_func and all others).  Is this patch OK?
make check-target-libgomp pass both using emul and hw, offloading from dlopened
libs also works fine.


libgomp/
* target.c (finalized): New static variable.
(resolve_device): Do nothing when finalized is true.
(GOMP_offload_register_ver): Likewise.
(GOMP_offload_unregister_ver): Likewise.
(gomp_target_fini): New static function.
(gomp_target_init): Call gomp_target_fini at exit.
liboffloadmic/
* plugin/libgomp-plugin-intelmic.cpp (unregister_main_image): Remove.
(register_main_image): Do not call unregister_main_image at exit.
(GOMP_OFFLOAD_fini_device): Allow for OpenMP.  Unregister main image.


diff --git a/libgomp/target.c b/libgomp/target.c
index cf9d0e6..320178e 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -78,6 +78,10 @@ static int num_devices;
 /* Number of GOMP_OFFLOAD_CAP_OPENMP_400 devices.  */
 static int num_devices_openmp;
 
+/* True when offloading runtime is finalized.  */
+static bool finalized;
+
+
 /* Similar to gomp_realloc, but release register_lock before gomp_fatal.  */
 
 static void *
@@ -108,6 +112,9 @@ gomp_get_num_devices (void)
 static struct gomp_device_descr *
 resolve_device (int device_id)
 {
+  if (finalized)
+return NULL;
+
   if (device_id == GOMP_DEVICE_ICV)
 {
   struct gomp_task_icv *icv = gomp_icv (false);
@@ -1095,6 +1102,9 @@ GOMP_offload_register_ver (unsigned version, const void 
*host_table,
 {
   int i;
 
+  if (finalized)
+return;
+
   if (GOMP_VERSION_LIB (version) > GOMP_VERSION)
 gomp_fatal ("Library too old for offload (version %u < %u)",
GOMP_VERSION, GOMP_VERSION_LIB (version));
@@ -1143,6 +1153,9 @@ GOMP_offload_unregister_ver (unsigned version, const void 
*host_table,
 {
   int i;
 
+  if (finalized)
+return;
+
   gomp_mutex_lock (_lock);
 
   /* Unload image from all initialized devices.  */
@@ -2282,6 +2295,24 @@ gomp_load_plugin_for_device (struct gomp_device_descr 
*device,
   return 0;
 }
 
+/* This function finalizes the runtime needed for offloading and all 
initialized
+   devices.  */
+
+static void
+gomp_target_fini (void)
+{
+  finalized = true;
+
+  int i;
+  for (i = 0; i < num_devices; i++)
+{
+  struct gomp_device_descr *devicep = [i];
+  gomp_mutex_lock (>lock);
+  gomp_fini_device (devicep);
+  gomp_mutex_unlock (>lock);
+}
+}
+
 /* This function initializes the runtime needed for offloading.
It parses the list of offload targets and tries to load the plugins for
these targets.  On return, the variables NUM_DEVICES and NUM_DEVICES_OPENMP
@@ -2387,6 +2418,9 @@ gomp_target_init (void)
   if (devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENACC_200)
goacc_register ([i]);
 }
+
+  if (atexit (gomp_target_fini) != 0)
+gomp_fatal ("atexit failed");
 }
 
 #else /* PLUGIN_SUPPORT */
diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp 
b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index f8c1725..68f7b2c 100644
--- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -231,12 +231,6 @@ offload (const char *file, uint6

Re: [gomp4.5] Handle #pragma omp declare target link

2015-12-01 Thread Ilya Verbin

> On 01 Dec 2015, at 11:18, Jakub Jelinek <ja...@redhat.com> wrote:
> 
>> On Mon, Nov 30, 2015 at 11:55:20PM +0300, Ilya Verbin wrote:
>> Ok, but it doesn't solve the issue with doing it for the executable, because
>> gomp_unmap_tgt (n->tgt) will want to run free_func on uninitialized device.
> 
> ?? You mean that the
> devicep->unload_image_func (devicep->target_id, version, target_data);
> call deinitializes the device or something else (I mean, if there is some
> other tgt, then it had to be initialized)?

No, I mean that it can be deinitialized from plugin's __run_exit_handlers (see 
my last mail with the patch).

  -- Ilya

Re: [gomp4.5] Handle #pragma omp declare target link

2015-11-30 Thread Ilya Verbin
On Mon, Nov 30, 2015 at 13:04:59 +0100, Jakub Jelinek wrote:
> On Fri, Nov 27, 2015 at 07:50:09PM +0300, Ilya Verbin wrote:
> > + /* Most significant bit of the size marks such vars.  */
> > + unsigned HOST_WIDE_INT isize = tree_to_uhwi (size);
> > + isize |= 1ULL << (int_size_in_bytes (const_ptr_type_node) * 8 - 1);
> 
> That supposedly should be BITS_PER_UNIT instead of 8.

Fixed.

> > diff --git a/gcc/varpool.c b/gcc/varpool.c
> > index 36f19a6..cbd1e05 100644
> > --- a/gcc/varpool.c
> > +++ b/gcc/varpool.c
> > @@ -561,17 +561,21 @@ varpool_node::assemble_decl (void)
> >   are not real variables, but just info for debugging and codegen.
> >   Unfortunately at the moment emutls is not updating varpool correctly
> >   after turning real vars into value_expr vars.  */
> > +#ifndef ACCEL_COMPILER
> >if (DECL_HAS_VALUE_EXPR_P (decl)
> >&& !targetm.have_tls)
> >  return false;
> > +#endif
> >  
> >/* Hard register vars do not need to be output.  */
> >if (DECL_HARD_REGISTER (decl))
> >  return false;
> >  
> > +#ifndef ACCEL_COMPILER
> >gcc_checking_assert (!TREE_ASM_WRITTEN (decl)
> >&& TREE_CODE (decl) == VAR_DECL
> >&& !DECL_HAS_VALUE_EXPR_P (decl));
> > +#endif
> 
> This looks wrong, both of these clearly could affect anything with
> DECL_HAS_VALUE_EXPR_P, not just the link vars.
> So, if you need to handle the "omp declare target link" vars specially,
> you should only handle those specially and nothing else.  And please try to
> explain why.

Actually these ifndefs are not needed, because assemble_decl never will be
called by accel compiler for original link vars.  I've added a check into
output_in_order, but missed a second place where assemble_decl is called -
symbol_table::output_variables.  So, fixed now.

> > @@ -1005,13 +1026,18 @@ gomp_load_image_to_device (struct gomp_device_descr 
> > *devicep, unsigned version,
> >for (i = 0; i < num_vars; i++)
> >  {
> >struct addr_pair *target_var = _table[num_funcs + i];
> > -  if (target_var->end - target_var->start
> > - != (uintptr_t) host_var_table[i * 2 + 1])
> > +  uintptr_t target_size = target_var->end - target_var->start;
> > +
> > +  /* Most significant bit of the size marks "omp declare target link"
> > +variables.  */
> > +  bool is_link = target_size & (1ULL << (sizeof (uintptr_t) * 8 - 1));
> 
> __CHAR_BIT__ here instead of 8?

Fixed.

> > @@ -1019,7 +1045,7 @@ gomp_load_image_to_device (struct gomp_device_descr 
> > *devicep, unsigned version,
> >k->host_end = k->host_start + (uintptr_t) host_var_table[i * 2 + 1];
> >k->tgt = tgt;
> >k->tgt_offset = target_var->start;
> > -  k->refcount = REFCOUNT_INFINITY;
> > +  k->refcount = is_link ? REFCOUNT_LINK : REFCOUNT_INFINITY;
> >k->async_refcount = 0;
> >array->left = NULL;
> >array->right = NULL;
> 
> Do we need to do anything in gomp_unload_image_from_device ?
> I mean at least in questionable programs that for link vars don't decrement
> the refcount of the var that replaced the link var to 0 first before
> dlclosing the library.
> At least host_var_table[j * 2 + 1] will have the MSB set, so we need to
> handle it differently.  Perhaps for that case perform a lookup, and if we
> get something which has link_map non-NULL, first perform as if there is
> target exit data delete (var) on it first?

You're right, it doesn't deallocate memory on the device if DSO leaves nonzero
refcount.  And currently host compiler doesn't set MSB in host_var_table, it's
set only by accel compiler.  But it's possible to do splay_tree_lookup for each
var to determine whether is it linked or not, like in the patch bellow.
Or do you prefer to set the bit in host compiler too?  It requires
lookup_attribute ("omp declare target link") for all vars in the table during
compilation, but allows to do splay_tree_lookup at run-time only for vars with
MSB set in host_var_table.
Unfortunately, calling gomp_exit_data from gomp_unload_image_from_device works
only for DSO, but it crashed when an executable leaves nonzero refcount, because
target device may be already uninitialized from plugin's __run_exit_handlers
(and it is in case of intelmic), so gomp_exit_data cannot run free_func.
Is it possible do add some atexit (...) to libgomp, which will set shutting_down
flag, and just do nothing in gomp_unload_image_from_device if it is set?


diff --git a/gcc/c

Re: [gomp4.5] Handle #pragma omp declare target link

2015-11-30 Thread Ilya Verbin
On Mon, Nov 30, 2015 at 21:49:02 +0100, Jakub Jelinek wrote:
> On Mon, Nov 30, 2015 at 11:29:34PM +0300, Ilya Verbin wrote:
> > You're right, it doesn't deallocate memory on the device if DSO leaves 
> > nonzero
> > refcount.  And currently host compiler doesn't set MSB in host_var_table, 
> > it's
> > set only by accel compiler.  But it's possible to do splay_tree_lookup for 
> > each
> > var to determine whether is it linked or not, like in the patch bellow.
> > Or do you prefer to set the bit in host compiler too?  It requires
> > lookup_attribute ("omp declare target link") for all vars in the table 
> > during
> > compilation, but allows to do splay_tree_lookup at run-time only for vars 
> > with
> > MSB set in host_var_table.
> > Unfortunately, calling gomp_exit_data from gomp_unload_image_from_device 
> > works
> > only for DSO, but it crashed when an executable leaves nonzero refcount, 
> > because
> > target device may be already uninitialized from plugin's __run_exit_handlers
> > (and it is in case of intelmic), so gomp_exit_data cannot run free_func.
> > Is it possible do add some atexit (...) to libgomp, which will set 
> > shutting_down
> > flag, and just do nothing in gomp_unload_image_from_device if it is set?
> 
> Sorry, I didn't mean you should call gomp_exit_data, what I meant was that
> you perform the same action as would delete(var) do in that case.
> Calling gomp_exit_data e.g. looks it up again etc.
> Supposedly having the MSB in host table too is useful, so if you could
> handle that, it would be nice.  And splay_tree_lookup only if the MSB is
> set.
> So,
> if (!host_data_has_msb_set)
>   splay_tree_remove (>mem_map, );
> else
>   {
> splay_tree_key n = splay_tree_lookup (>mem_map, );
> if (n->link_key)
> {
>   n->refcount = 0;
>   n->link_key = NULL;
>   splay_tree_remove (>mem_map, n);
>   if (n->tgt->refcount > 1)
> n->tgt->refcount--;
>   else
> gomp_unmap_tgt (n->tgt);
> }
>   else
> splay_tree_remove (>mem_map, n);
>   }
> or so.

Ok, but it doesn't solve the issue with doing it for the executable, because
gomp_unmap_tgt (n->tgt) will want to run free_func on uninitialized device.

  -- Ilya


Re: [RFC] Getting LTO incremental linking work

2015-11-28 Thread Ilya Verbin
2015-11-28 14:01 GMT+03:00 Tom de Vries :
> This patch fixes the failures. I'm not sure if this is the right or complete
> fix though.

I think it's ok, at least until we decide how to rework the offloading
stuff in lto-wrapper (see PR68463).

Thanks,
  -- Ilya


Re: [gomp4.5] Handle #pragma omp declare target link

2015-11-27 Thread Ilya Verbin
On Thu, Nov 19, 2015 at 16:31:15 +0100, Jakub Jelinek wrote:
> On Mon, Nov 16, 2015 at 06:40:43PM +0300, Ilya Verbin wrote:
> > @@ -2009,7 +2010,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
> >   decl = OMP_CLAUSE_DECL (c);
> >   /* Global variables with "omp declare target" attribute
> >  don't need to be copied, the receiver side will use them
> > -directly.  */
> > +directly.  However, global variables with "omp declare target link"
> > +attribute need to be copied.  */
> >   if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
> >   && DECL_P (decl)
> >   && ((OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_FIRSTPRIVATE_POINTER
> > @@ -2017,7 +2019,9 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
> >!= GOMP_MAP_FIRSTPRIVATE_REFERENCE))
> >   || TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE)
> >   && is_global_var (maybe_lookup_decl_in_outer_ctx (decl, ctx))
> > - && varpool_node::get_create (decl)->offloadable)
> > + && varpool_node::get_create (decl)->offloadable
> > + && !lookup_attribute ("omp declare target link",
> > +   DECL_ATTRIBUTES (decl)))
> 
> I wonder if Honza/Richi wouldn't prefer to have this info also
> in cgraph, instead of looking up the attribute in each case.

So should I add a new flag into cgraph?
Also it is used in gimplify_adjust_omp_clauses.

> > +  if (var.link_ptr_decl == NULL_TREE)
> > +   addr = build_fold_addr_expr (var.decl);
> > +  else
> > +   {
> > + /* For "omp declare target link" var use address of the pointer
> > +instead of address of the var.  */
> > + addr = build_fold_addr_expr (var.link_ptr_decl);
> > + /* Most significant bit of the size marks such vars.  */
> > + unsigned HOST_WIDE_INT isize = tree_to_uhwi (size);
> > + isize |= 1ULL << (int_size_in_bytes (const_ptr_type_node) * 8 - 1);
> > + size = wide_int_to_tree (const_ptr_type_node, isize);
> > +
> > + /* FIXME: Remove varpool node of var?  */
> 
> There is varpool_node::remove (), but not sure if at this point all the
> references are already gone.

Actually removing varpool node here will not remove var from the target code, so
I've added a check in cgraphunit.c before assemble_decl ().

> > +class pass_omp_target_link : public gimple_opt_pass
> > +{
> > +public:
> > +  pass_omp_target_link (gcc::context *ctxt)
> > +: gimple_opt_pass (pass_data_omp_target_link, ctxt)
> > +  {}
> > +
> > +  /* opt_pass methods: */
> > +  virtual bool gate (function *fun)
> > +{
> > +#ifdef ACCEL_COMPILER
> > +  /* FIXME: Replace globals in target regions too or not?  */
> > +  return lookup_attribute ("omp declare target",
> > +  DECL_ATTRIBUTES (fun->decl));
> 
> Certainly in "omp declare target entrypoint" regions too.

Done.

> > +unsigned
> > +pass_omp_target_link::execute (function *fun)
> > +{
> > +  basic_block bb;
> > +  FOR_EACH_BB_FN (bb, fun)
> > +{
> > +  gimple_stmt_iterator gsi;
> > +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next ())
> > +   {
> > + unsigned i;
> > + gimple *stmt = gsi_stmt (gsi);
> > + for (i = 0; i < gimple_num_ops (stmt); i++)
> > +   {
> > + tree op = gimple_op (stmt, i);
> > + tree var = NULL_TREE;
> > +
> > + if (!op)
> > +   continue;
> > + if (TREE_CODE (op) == VAR_DECL)
> > +   var = op;
> > + else if (TREE_CODE (op) == ADDR_EXPR)
> > +   {
> > + tree op1 = TREE_OPERAND (op, 0);
> > + if (TREE_CODE (op1) == VAR_DECL)
> > +   var = op1;
> > +   }
> > + /* FIXME: Support arrays.  What else?  */
> 
> We need to support all the references to the variables.
> So, I think this approach is not right.
> 
> > +
> > + if (var && lookup_attribute ("omp declare target link",
> > +  DECL_ATTRIBUTES (var)))
> > +   {
> > + tree type = TREE_TYPE (var);
> > + tree ptype = build_pointer_type (type);
> > +
> > + /* Find var in offload table.  */
> > + omp_offload_var *table_entry = NULL;
> > + for (unsigned j = 0; j < vec_safe_lengt

Re: Enable pointer TBAA for LTO

2015-11-23 Thread Ilya Verbin
On Mon, Nov 23, 2015 at 16:31:42 +0100, Richard Biener wrote:
> I think it also causes the following and one related ICE
> 
> FAIL: gcc.dg/vect/pr62021.c -flto -ffat-lto-objects (internal compiler 
> error)
> 
> /space/rguenther/src/svn/trunk3/gcc/testsuite/gcc.dg/vect/pr62021.c:7:1: 
> internal compiler error: in get_alias_set, at alias.c:880^M
> 0x7528a7 get_alias_set(tree_node*)^M
> /space/rguenther/src/svn/trunk3/gcc/alias.c:880^M
> 0x751ce5 component_uses_parent_alias_set_from(tree_node const*)^M
> /space/rguenther/src/svn/trunk3/gcc/alias.c:635^M
> 0x7522ad reference_alias_ptr_type_1^M
> /space/rguenther/src/svn/trunk3/gcc/alias.c:747^M
> 0x752683 get_alias_set(tree_node*)^M
> ...

And an ICE in intelmicemul offloading compiler:

FAIL: libgomp.c++/for-11.C (internal compiler error)
FAIL: libgomp.c++/for-13.C (internal compiler error)
FAIL: libgomp.c++/for-14.C (internal compiler error)
FAIL: libgomp.c/for-3.c (internal compiler error)
FAIL: libgomp.c/for-5.c (internal compiler error)
FAIL: libgomp.c/for-6.c (internal compiler error)

libgomp/testsuite/libgomp.c/for-2.h:201:9: internal compiler error: in 
get_alias_set, at alias.c:880
0x710eef get_alias_set(tree_node*)
gcc/alias.c:880
0x71032d component_uses_parent_alias_set_from(tree_node const*)
gcc/alias.c:635
0x7108f5 reference_alias_ptr_type_1
gcc/alias.c:747
0x710ccb get_alias_set(tree_node*)
gcc/alias.c:843
0x89d208 expand_assignment(tree_node*, tree_node*, bool)
gcc/expr.c:5020
0x768ff7 expand_gimple_stmt_1
gcc/cfgexpand.c:3592
0x7693e2 expand_gimple_stmt
gcc/cfgexpand.c:3688
0x7704ed expand_gimple_basic_block
gcc/cfgexpand.c:5694
0x771ff1 execute
gcc/cfgexpand.c:6309
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

  -- Ilya


Re: [PATCH 12/12] always define ENABLE_OFFLOADING

2015-11-23 Thread Ilya Verbin
On Mon, Nov 09, 2015 at 19:41:21 +0100, Bernd Schmidt wrote:
> On 11/09/2015 05:47 PM, tbsaunde+...@tbsaunde.org wrote:
> >-#ifdef ENABLE_OFFLOADING
> >/* If the user didn't specify any, default to all configured offload
> >   targets.  */
> >if (offload_targets == NULL)
> >  handle_foffload_option (OFFLOAD_TARGETS);
> >-#endif
> 
> This one I would keep guarded with an if.
> 
> Otherwise ok modulo stage 1 end.

There are 2 new uses of "#ifdef ENABLE_OFFLOADING" in c_parser_oacc_declare and
cp_parser_oacc_declare.
I don't know how to properly test OpenACC, so here is untested patch.


diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 7b10764..1dc0bd5 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -13473,14 +13473,15 @@ c_parser_oacc_declare (c_parser *parser)
  if (node != NULL)
{
  node->offloadable = 1;
-#ifdef ENABLE_OFFLOADING
- g->have_offload = true;
- if (is_a  (node))
+ if (ENABLE_OFFLOADING)
{
- vec_safe_push (offload_vars, decl);
- node->force_output = 1;
+ g->have_offload = true;
+ if (is_a  (node))
+   {
+ vec_safe_push (offload_vars, decl);
+ node->force_output = 1;
+   }
}
-#endif
}
}
}
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 24ed404..a9c0a45 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -34633,14 +34633,15 @@ cp_parser_oacc_declare (cp_parser *parser, cp_token 
*pragma_tok)
  if (node != NULL)
{
  node->offloadable = 1;
-#ifdef ENABLE_OFFLOADING
- g->have_offload = true;
- if (is_a  (node))
+ if (ENABLE_OFFLOADING)
{
- vec_safe_push (offload_vars, decl);
- node->force_output = 1;
+ g->have_offload = true;
+ if (is_a  (node))
+   {
+ vec_safe_push (offload_vars, decl);
+ node->force_output = 1;
+   }
}
-#endif
}
}
}

  -- Ilya


Re: [PATCH 6/n] OpenMP 4.0 offloading infrastructure: option handling

2015-11-20 Thread Ilya Verbin
On Wed, Dec 10, 2014 at 01:48:21 +0300, Ilya Verbin wrote:
> On 09 Dec 14:59, Richard Biener wrote:
> > On Mon, 8 Dec 2014, Ilya Verbin wrote:
> > > Unfortunately, this fix was not general enough.
> > > There might be cases when mixed object files get into lto-wrapper, ie 
> > > some of
> > > them contain only LTO sections, some contain only offload sections, and 
> > > some
> > > contain both.  But when lto-wrapper will pass all these files to 
> > > recompilation,
> > > the compiler might crash (it depends on the order of input files), since 
> > > in
> > > read_cgraph_and_symbols it expects that *all* input files contain IR 
> > > section of
> > > given type.
> > > This patch splits input objects from argv into lto_argv and offload_argv, 
> > > so
> > > that all files in arrays contain corresponding IR.
> > > Similarly, in lto-plugin, it was bad idea to add objects, which contain 
> > > offload
> > > IR without LTO, to claimed_files, since this may corrupt a resolution 
> > > file.
> > > 
> > > Tested on various combinations of files with/without -flto and 
> > > with/without
> > > offload, using trunk ld and gold, also tested on ld without plugin 
> > > support.
> > > Bootstrap and make check passed on x86_64-linux and i686-linux.  Ok for 
> > > trunk?
> > 
> > Did you check that bootstrap-lto still works?  Ok if so.
> 
> Yes, bootstrap-lto passed.
> Committed revision 218543.

I don't know how I missed this a year ago, but mixing of LTO objects with
offloading-without-LTO objects still doesn't work :(
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68463 filed about that.
Any thoughts how to fix this?

Thanks,
  -- Ilya


Re: [PATCH] Implement GOMP_OFFLOAD_unload_image in intelmic plugin

2015-11-19 Thread Ilya Verbin
On Thu, Nov 19, 2015 at 14:33:06 +0100, Jakub Jelinek wrote:
> On Mon, Nov 16, 2015 at 08:33:28PM +0300, Ilya Verbin wrote:
> > diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp 
> > b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
> > index 772e198..6ee585e 100644
> > --- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
> > +++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
> > @@ -65,6 +65,17 @@ typedef std::vector DevAddrVect;
> >  /* Addresses for all images and all devices.  */
> >  typedef std::map ImgDevAddrMap;
> >  
> > +/* Image descriptor needed by __offload_[un]register_image.  */
> > +struct TargetImageDesc {
> > +  int64_t size;
> > +  /* 10 characters is enough for max int value.  */
> > +  char name[sizeof ("lib00.so")];
> > +  char data[];
> > +} __attribute__ ((packed));
> 
> Why the packed attribute?  I know it is preexisting, but with int64_t
> being the first and then just char, there is no padding in between fields.

Hmmm, I can't remember, but I definitely have added this attribute 2 years ago,
because liboffloadmic failed to register the image.  Anyway, now everything
works fine without it.

> And to determine the size without data, you can just use offsetof.

I will add this:

diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp 
b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index 6ee585e..f8c1725 100644
--- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -71,7 +71,7 @@ struct TargetImageDesc {
   /* 10 characters is enough for max int value.  */
   char name[sizeof ("lib00.so")];
   char data[];
-} __attribute__ ((packed));
+};
 
 /* Image descriptors, indexed by a pointer obtained from libgomp.  */
 typedef std::map ImgDescMap;
@@ -313,9 +313,8 @@ offload_image (const void *target_image)
 target_image, image_start, image_end);
 
   int64_t image_size = (uintptr_t) image_end - (uintptr_t) image_start;
-  TargetImageDesc *image
-= (TargetImageDesc *) malloc (sizeof (int64_t) + sizeof 
("lib00.so")
- + image_size);
+  TargetImageDesc *image = (TargetImageDesc *) malloc (offsetof 
(TargetImageDesc, data)
+  + image_size);
   if (!image)
 {
   fprintf (stderr, "%s: Can't allocate memory\n", __FILE__);


> > @@ -217,13 +231,27 @@ offload (const char *file, uint64_t line, int device, 
> > const char *name,
> >  }
> >  
> >  static void
> > +unregister_main_image ()
> > +{
> > +  __offload_unregister_image (_target_image);
> > +}
> > +
> > +static void
> >  register_main_image ()
> >  {
> > +  /* Do not check the return value, because old versions of liboffloadmic 
> > did
> > + not have return values.  */
> >__offload_register_image (_target_image);
> >  
> >/* liboffloadmic will call GOMP_PLUGIN_target_task_completion when
> >   asynchronous task on target is completed.  */
> >__offload_register_task_callback (GOMP_PLUGIN_target_task_completion);
> > +
> > +  if (atexit (unregister_main_image) != 0)
> > +{
> > +  fprintf (stderr, "%s: atexit failed\n", __FILE__);
> > +  exit (1);
> > +}
> >  }
> 
> What is the point of this hunk?  Is there any point in unregistering the
> main target image?  I mean at that point the process is exiting anyway.
> The importance of unregistering target images registered from shared
> libraries is that they should be unregistered when they are dlclosed.

liboffloadmic performs correct finalization of the target process in
__offload_fini_library, which is called only during unregistration of the main
target image.
Without this finalization the target process will be destroyed after unloading
libcoi_host.so.  And then some DSO may call GOMP_offload_unregister_ver from its
destructor, which will try to unload target image from the already destroyed
process.  This issue is reproducible only using real COI.

  -- Ilya


Re: [gomp4.1] Handle linear clause modifiers in declare simd

2015-11-18 Thread Ilya Verbin
Hi!

On Wed, Jul 01, 2015 at 12:55:38 +0200, Jakub Jelinek wrote:
> I've committed following patch, which per the new ABI additions
> mangles and handles the various new linear clause modifiers in
> declare simd functions.  The vectorizer side is not done yet,
>
> [...]
>
> @@ -14195,12 +14216,25 @@ simd_clone_mangle (struct cgraph_node *n
>  {
>struct cgraph_simd_clone_arg arg = clone_info->args[n];
>  
> -  if (arg.arg_type == SIMD_CLONE_ARG_TYPE_UNIFORM)
> - pp_character (, 'u');
> -  else if (arg.arg_type == SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP)
> +  switch (arg.arg_type)
>   {
> -   gcc_assert (arg.linear_step != 0);
> + case SIMD_CLONE_ARG_TYPE_UNIFORM:
> +   pp_character (, 'u');
> +   break;
> + case SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP:
> pp_character (, 'l');
> +   goto mangle_linear;
> + case SIMD_CLONE_ARG_TYPE_LINEAR_REF_CONSTANT_STEP:
> +   pp_character (, 'R');
> +   goto mangle_linear;
> + case SIMD_CLONE_ARG_TYPE_LINEAR_VAL_CONSTANT_STEP:
> +   pp_character (, 'L');
> +   goto mangle_linear;
> + case SIMD_CLONE_ARG_TYPE_LINEAR_UVAL_CONSTANT_STEP:
> +   pp_character (, 'U');
> +   goto mangle_linear;
> + mangle_linear:
> +   gcc_assert (arg.linear_step != 0);

Could you please point to where the new ABI additions are documented?
I can't find R/L/U parameter types in [1] and [2].

[1] 
https://sourceware.org/glibc/wiki/libmvec?action=AttachFile=view=VectorABI.txt
[2] https://groups.google.com/forum/#!topic/x86-64-abi/LmppCfN1rZ4

Thanks,
  -- Ilya


Re: libgomp: Compile-time error for non-portable gomp_mutex_t initialization

2015-11-18 Thread Ilya Verbin
On Fri, Sep 25, 2015 at 17:28:25 +0200, Jakub Jelinek wrote:
> On Fri, Sep 25, 2015 at 05:04:47PM +0200, Thomas Schwinge wrote:
> > On Thu, 26 Mar 2015 23:41:30 +0300, Ilya Verbin <iver...@gmail.com> wrote:
> > > On Thu, Mar 26, 2015 at 13:09:19 +0100, Jakub Jelinek wrote:
> > > > the current code is majorly broken.  As I've said earlier, e.g. the lack
> > > > of mutex guarding gomp_target_init (which is using pthread_once 
> > > > guaranteed
> > > > to be run just once) vs. concurrent GOMP_offload_register calls
> > > > (if those are run from ctors, then I guess something like dl_load_lock
> > > > ensures at least on glibc that multiple GOMP_offload_register calls 
> > > > aren't
> > > > performed at the same time) in accessing/reallocating offload_images
> > > > and num_offload_images and the lack of support to register further
> > > > images after the gomp_target_init call (if you dlopen further shared
> > > > libraries) is really bad.  And it would be really nice to support the
> > > > unloading.
> > 
> > > Here is the latest patch for libgomp and mic plugin.
> > 
> > > libgomp/
> > 
> > >   * target.c (register_lock): New mutex for offload image registration.
> > 
> > >   (GOMP_offload_register): Add mutex lock.
> 
> That is definitely wrong.  You'd totally break --disable-linux-futex support
> on linux and bootstrap on e.g. Solaris and various other pthread targets.

I don't quite understand, do you mean that gcc 5 and trunk are broken, because
register_lock doesn't have initialization?  But it seems that bootstrap on
Solaris and other targets works fine...

> At least for ELF and dynamic linking, shared libraries that contain
> constructors that call GOMP_offload_register* should have DT_NEEDED libgomp
> and thus libgomp's constructors should be run before the constructors of
> the libraries that call GOMP_offload_register*.

So, libgomp should contain a constructor, which will call gomp_mutex_init
(_lock) before any call to GOMP_offload_register*, right?

> For the targets without known zero initializer for gomp_mutex_lock, either
> there is an option to use pthread_once to make sure it is initialized once,
> or there is an option to define a macro like GOMP_MUTEX_INITIALIZER,
> defined to PTHREAD_MUTEX_INITIALIZER in config/posix/mutex.h and to
> { 0 } in config/linux/mutex.h and something like {} or whatever in
> config/rtems/mutex.h.  Then for the non-automatic non-heap
> gomp_mutex_t's you could just initialize them in their initializers
> with GOMP_MUTEX_INITIALIZER.

  -- Ilya


[gomp4.5] Handle #pragma omp declare target link

2015-11-16 Thread Ilya Verbin
Hi!

On Mon, Oct 26, 2015 at 20:49:40 +0100, Jakub Jelinek wrote:
> On Mon, Oct 26, 2015 at 10:39:04PM +0300, Ilya Verbin wrote:
> > > Without declare target link or to, you can't use the global variables
> > > in orphaned accelerated routines (unless you e.g. take the address of the
> > > mapped variable in the region and pass it around).
> > > The to variables (non-deferred) are always mapped and are initialized with
> > > the original initializer, refcount is infinity.  link (deferred) work more
> > > like the normal mapping, referencing those vars when they aren't 
> > > explicitly
> > > (or implicitly) mapped is unspecified behavior, if it is e.g. mapped 
> > > freshly
> > > with to kind, it gets the current value of the host var rather than the
> > > original one.  But, beyond the mapping the compiler needs to ensure that
> > > all uses of the link global var (or perhaps just all uses of the link 
> > > global
> > > var outside of the target construct body where it is mapped, because you
> > > could use there the pointer you got from GOMP_target) are replaced by
> > > dereference of some artificial pointer, so a becomes *a_tmp and  becomes
> > > &*a_tmp, and that the runtime library during registration of the tables is
> > > told about the address of this artificial pointer.  During registration,
> > > I'd expect it would stick an entry for this range into the table, with 
> > > some
> > > special flag or something similar, indicating that it is deferred mapping
> > > and where the offloading device pointer is.  During mapping, it would map 
> > > it
> > > as any other not yet mapped object, but additionally would also set this
> > > device pointer to the device address of the mapped object.  We also need 
> > > to
> > > ensure that when we drop the refcount of that mapping back to 0, we get it
> > > back to the state where it is described as a range with registered 
> > > deferred
> > > mapping and where the device pointer is.
> > 
> > Ok, got it, I'll try implement this...
> 
> Thanks.
> 
> > > > > we actually replace the variables with pointers to variables, then 
> > > > > need
> > > > > to somehow also mark those in the offloading tables, so that the 
> > > > > library
> > > > 
> > > > I see 2 possible options: use the MSB of the size, or introduce the 
> > > > third field
> > > > for flags.
> > > 
> > > Well, it can be either recorded in the host variable tables (which contain
> > > address and size pair, right), or in corresponding offloading device table
> > > (which contains the pointer, something else?).
> > 
> > It contains a size too, which is checked in libgomp:
> >   gomp_fatal ("Can't map target variables (size mismatch)");
> > Yes, we can remove this check, and use second field in device table for 
> > flags.
> 
> Yeah, or e.g. just use MSB of that size (so check that either the size is
> the same (then it is target to) or it is MSB | size (then it is target link).
> Objects larger than half of the address space aren't really supportable
> anyway.

Here is WIP patch, not for check-in.  There are still many FIXMEs, which I am
going to resolve, however target-link-1.c testcase pass.
Is this approach correct?  Any comments on FIXMEs?


diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 23d0107..58771c0 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -15895,7 +15895,10 @@ c_parser_omp_declare_target (c_parser *parser)
  g->have_offload = true;
  if (is_a  (node))
{
- vec_safe_push (offload_vars, t);
+ omp_offload_var var;
+ var.decl = t;
+ var.link_ptr_decl = NULL_TREE;
+ vec_safe_push (offload_vars, var);
  node->force_output = 1;
}
 #endif
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index d1f4970..b890f6d 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -34999,7 +34999,10 @@ cp_parser_omp_declare_target (cp_parser *parser, 
cp_token *pragma_tok)
  g->have_offload = true;
  if (is_a  (node))
{
- vec_safe_push (offload_vars, t);
+ omp_offload_var var;
+ var.decl = t;
+ var.link_ptr_decl = NULL_TREE;
+ vec_safe_push (offload_vars, var);
  node->force_output = 1;
}
 #endif
diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 67a9024..878a9c5 100644
--- a/gcc/

Re: [PATCH] Implement GOMP_OFFLOAD_unload_image in intelmic plugin

2015-11-16 Thread Ilya Verbin
On Tue, Sep 08, 2015 at 22:41:17 +0300, Ilya Verbin wrote:
> This patch supports unloading of target images from the device.
> Unfortunately __offload_unregister_image requires the whole descriptor for
> unloading, which must contain target code inside, for this reason the plugin
> keeps descriptors for all offloaded images in memory.
> Also the patch removes useless variable names, intended for debug purposes.
> Regtested with make check-target-libgomp and using a dlopen/dlclose test.
> OK for trunk?
> 
> liboffloadmic/
>   * plugin/libgomp-plugin-intelmic.cpp (struct TargetImageDesc): New.
>   (ImgDescMap): New typedef.
>   (image_descriptors): New static var.
>   (init): Allocate image_descriptors.
>   (offload): Remove vars2 argument.  Pass NULL to __offload_offload1
>   instead of vars2.
>   (unregister_main_image): New static function.
>   (register_main_image): Call unregister_main_image at exit.
>   (GOMP_OFFLOAD_init_device): Print device number, fix offload args.
>   (GOMP_OFFLOAD_fini_device): Likewise.
>   (get_target_table): Remove vd1g and vd2g, don't pass them to offload.
>   (offload_image): Remove declaration of the struct TargetImage.
>   Free table.  Insert new descriptor into image_descriptors.
>   (GOMP_OFFLOAD_unload_image): Call __offload_unregister_image, free
>   the corresponding descriptor, and remove it from address_table and
>   image_descriptors.
>   (GOMP_OFFLOAD_alloc): Print device number, remove vd1g.
>   (GOMP_OFFLOAD_free): Likewise.
>   (GOMP_OFFLOAD_host2dev): Print device number, remove vd1g and vd2g.
>   (GOMP_OFFLOAD_dev2host): Likewise.
>   (GOMP_OFFLOAD_run): Print device number, remove vd1g.
>   * plugin/offload_target_main.cpp (__offload_target_table_p1): Remove
>   vd2, don't pass it to __offload_target_enter.
>   (__offload_target_table_p2): Likewise.
>   (__offload_target_alloc): Likewise.
>   (__offload_target_free): Likewise.
>   (__offload_target_host2tgt_p1): Likewise.
>   (__offload_target_host2tgt_p2): Likewise.
>   (__offload_target_tgt2host_p1): Likewise.
>   (__offload_target_tgt2host_p2): Likewise.
>   (__offload_target_run): Likewise.

Ping?  Rebased and retested.


diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp 
b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index 772e198..6ee585e 100644
--- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -65,6 +65,17 @@ typedef std::vector DevAddrVect;
 /* Addresses for all images and all devices.  */
 typedef std::map ImgDevAddrMap;
 
+/* Image descriptor needed by __offload_[un]register_image.  */
+struct TargetImageDesc {
+  int64_t size;
+  /* 10 characters is enough for max int value.  */
+  char name[sizeof ("lib00.so")];
+  char data[];
+} __attribute__ ((packed));
+
+/* Image descriptors, indexed by a pointer obtained from libgomp.  */
+typedef std::map ImgDescMap;
+
 
 /* Total number of available devices.  */
 static int num_devices;
@@ -76,6 +87,9 @@ static int num_images;
second key is number of device.  Contains a vector of pointer pairs.  */
 static ImgDevAddrMap *address_table;
 
+/* Descriptors of all images, registered in liboffloadmic.  */
+static ImgDescMap *image_descriptors;
+
 /* Thread-safe registration of the main image.  */
 static pthread_once_t main_image_is_registered = PTHREAD_ONCE_INIT;
 
@@ -156,6 +170,7 @@ init (void)
 
 out:
   address_table = new ImgDevAddrMap;
+  image_descriptors = new ImgDescMap;
   num_devices = _Offload_number_of_devices ();
 }
 
@@ -192,14 +207,13 @@ GOMP_OFFLOAD_get_num_devices (void)
 
 static void
 offload (const char *file, uint64_t line, int device, const char *name,
-int num_vars, VarDesc *vars, VarDesc2 *vars2, const void **async_data)
+int num_vars, VarDesc *vars, const void **async_data)
 {
   OFFLOAD ofld = __offload_target_acquire1 (, file, line);
   if (ofld)
 {
   if (async_data == NULL)
-   __offload_offload1 (ofld, name, 0, num_vars, vars, vars2, 0, NULL,
-   NULL);
+   __offload_offload1 (ofld, name, 0, num_vars, vars, NULL, 0, NULL, NULL);
   else
{
  OffloadFlags flags;
@@ -217,13 +231,27 @@ offload (const char *file, uint64_t line, int device, 
const char *name,
 }
 
 static void
+unregister_main_image ()
+{
+  __offload_unregister_image (_target_image);
+}
+
+static void
 register_main_image ()
 {
+  /* Do not check the return value, because old versions of liboffloadmic did
+ not have return values.  */
   __offload_register_image (_target_image);
 
   /* liboffloadmic will call GOMP_PLUGIN_target_task_completion when
  asynchronous task on target is completed.  */
   __offload_register_task_callback (GOMP

Re: [gomp4.5] depend nowait support for target

2015-11-13 Thread Ilya Verbin
On Fri, Nov 13, 2015 at 16:11:50 +0100, Jakub Jelinek wrote:
> On Fri, Nov 13, 2015 at 11:18:41AM +0100, Jakub Jelinek wrote:
> > For the offloading case, I actually see a problematic spot, namely that
> > GOMP_PLUGIN_target_task_completion could finish too early, and get the
> > task_lock before the thread that run the gomp_target_task_fn doing map_vars
> > + async_run for it.  Bet I need to add further ttask state kinds and deal
> > with that case (so GOMP_PLUGIN_target_task_completion would just take the
> > task lock and tweak ttask state if it has not been added to the queues
> > yet).
> > Plus I think I want to improve the case where we are not waiting, in
> > gomp_create_target_task if not waiting for dependencies actually schedule
> > manually the gomp_target_task_fn.
> 
> These two have been resolved, plus target-34.c issue resolved too (the bug
> was that I've been too lazy and just put target-33.c test into #pragma omp
> parallel #pragma omp single, but that is invalid OpenMP, as single is a
> worksharing region and #pragma omp barrier may not be encountered in such a
> region.  Fixed by rewriting the testcase.
> 
> So here is a full patch that passes for me both non-offloading and
> offloading, OMP_NUM_THREADS=16 (implicit on my box) as well as
> OMP_NUM_THREADS=1 (explicit).  I've incorporated your incremental patch.
> 
> One option to avoid the static variable would be to pass two pointers
> instead of one (async_data), one would be the callback function pointer,
> another argument to it.  Or another possibility would be to say that
> the async_data argument the plugin passes to liboffloadmic would be
> pointer to structure, holding a function pointer (completion callback)
> and the data pointer to pass to it, and then the plugin would just
> GOMP_PLUGIN_malloc 2 * sizeof (void *) for it, fill it in and
> register some function in itself that would call the
> GOMP_PLUGIN_target_task_completion with the second structure element
> as argument and then free the structure pointer.

I don't know which interface to implement to maintain compatibility in the
future.
Anyway, currently it's impossible that a process will use the same liboffloadmic
for 2 different offloading paths (say GCC's in exec and ICC's in a dso), because
in fact GCC's and ICC's libraries are not the same.  First of all, they have
different names: liboffloadmic in GCC and just liboffload in ICC.  And most
importantly, ICC's version contains some references to libiomp5, which were
removed form GCC's version.  In theory, we want to use one library with all
compilers, but I'm not sure when it will be possible.

> Do you get still crashes on any of the testcases with this?

No, all tests now pass using emul.  I'll report when I have any results on HW.

Thanks,
  -- Ilya


Re: [gomp4.5] depend nowait support for target

2015-11-13 Thread Ilya Verbin
On Fri, Nov 13, 2015 at 17:41:53 +0100, Jakub Jelinek wrote:
> On Fri, Nov 13, 2015 at 07:37:17PM +0300, Ilya Verbin wrote:
> > I don't know which interface to implement to maintain compatibility in the
> > future.
> > Anyway, currently it's impossible that a process will use the same 
> > liboffloadmic
> > for 2 different offloading paths (say GCC's in exec and ICC's in a dso), 
> > because
> > in fact GCC's and ICC's libraries are not the same.  First of all, they have
> > different names: liboffloadmic in GCC and just liboffload in ICC.  And most
> > importantly, ICC's version contains some references to libiomp5, which were
> > removed form GCC's version.  In theory, we want to use one library with all
> > compilers, but I'm not sure when it will be possible.
> 
> Ok, in that case it is less of a problem.
> 
> > > Do you get still crashes on any of the testcases with this?
> > 
> > No, all tests now pass using emul.  I'll report when I have any results on 
> > HW.
> 
> Perfect, I'll commit it to gomp-4_5-branch then.

make check-target-libgomp with offloading to HW also passed :)

And this:

+++ b/libgomp/testsuite/libgomp.c/target-32.c
@@ -3,6 +3,7 @@
 
 int main ()
 {
+  int x = 1;
   int a = 0, b = 0, c = 0, d[7];
 
   #pragma omp parallel
@@ -18,6 +19,7 @@ int main ()
 
 #pragma omp target nowait map(alloc: b) depend(in: d[2]) depend(out: d[3])
 {
+  while (x);
   usleep (1000);
   #pragma omp atomic update
   b |= 4;
@@ -25,6 +27,7 @@ int main ()
 
 #pragma omp target nowait map(alloc: b) depend(in: d[2]) depend(out: d[4])
 {
+  while (x);
   usleep (5000);
   #pragma omp atomic update
   b |= 1;

demonstrates 200% CPU usage both using emul and HW, so 2 target tasks really run
concurrently.

  -- Ilya


Re: [gomp4.5] depend nowait support for target

2015-11-12 Thread Ilya Verbin
On Thu, Nov 12, 2015 at 18:45:09 +0100, Jakub Jelinek wrote:
> But the testcase I wrote (target-33.c) hangs, the problem is in the
>   #pragma omp target nowait map (tofrom: a, b) depend(out: d[3])
>   {
> #pragma omp atomic update
> a = a + 9;
> b -= 8;
>   }
>   #pragma omp target nowait map (tofrom: a, c) depend(out: d[4])
>   {
> #pragma omp atomic update
> a = a + 4;
> c >>= 1;
>   }
>   #pragma omp task if (0) depend (in: d[3], d[4])
>   if (a != 50 || b != 4 || c != 20)
> abort ();
> part, where (I should change that for the case of no dependencies
> eventually) the task with map_vars+async_run is queued in both cases,
> then we reach GOMP_task, which calls gomp_task_maybe_wait_for_dependencies
> which spawns the first half task (map_vars+async_run), and then
> the second half task (map_vars+async_run), but that one gets stuck somewhere
> in liboffloadmic, then some other thread (from liboffloadmic) calls
> GOMP_PLUGIN_target_task_completion and enqueues the second half of the first
> target task (unmap_vars), but as the only normal thread in the main program
> is stuck in liboffloadmic (during gomp_map_vars, trying to allocate
> target memory in the plugin), there is no thread to schedule the second half
> of first target task.  So, if liboffloadmic is stuck waiting for unmap_vars,
> it is a deadlock.  Can you please try to debug this?

I'm unable to reproduce the hang (have tried various values of OMP_NUM_THREADS).
The testcase just aborts at (a != 50 || b != 4 || c != 20), because
a == 37, b == 12, c == 40.

BTW, don't know is this a bug or not:
Conditional jump or move depends on uninitialised value(s)
   at 0x4C2083D: priority_queue_insert (priority_queue.h:347)
   by 0x4C24DF9: GOMP_PLUGIN_target_task_completion (task.c:678)

  -- Ilya


Re: [gomp4.5] depend nowait support for target

2015-11-12 Thread Ilya Verbin
On Thu, Nov 12, 2015 at 18:58:22 +0100, Jakub Jelinek wrote:
> > Unfortunately, target-32.c fails for me using emulation mode:
> 
> I haven't managed to get it stuck yet (unlike the target-33.c one, see
> another mail), what OMP_NUM_THREADS you are using
> and how many cores/threads?

OMP_NUM_THREADS isn't set.  40 cores.

  -- Ilya


Re: [gomp4.5] depend nowait support for target

2015-11-12 Thread Ilya Verbin
On Wed, Nov 11, 2015 at 17:52:22 +0100, Jakub Jelinek wrote:
> On Mon, Oct 19, 2015 at 10:47:54PM +0300, Ilya Verbin wrote:
> > So, here is what I have for now.  Attached target-29.c testcase works fine 
> > with
> > MIC emul, however I don't know how to (and where) properly check for 
> > completion
> > of async execution on target.  And, similarly, where to do unmapping after 
> > that?
> > Do we need a callback from plugin to libgomp (as far as I understood, PTX
> > runtime supports this, but HSA doesn't), or libgomp will just check for
> > ttask->is_completed in task.c?
> 
> Here is the patch updated to have a task.c defined function that the plugin
> can call upon completion of async offloading exection.

Thanks.

> The testsuite coverage will need to improve, the testcase is wrong
> (contains data races - if you want to test parallel running of two target
> regions that both touch the same var, I'd say best would be to use
> #pragma omp atomic and or in 4 in one case and 1 in another case, then
> test if result is 5 (and similarly for the other var).
> Also, with the usleeps Alex Monakov will be unhappy because PTX newlib does
> not have it, but we'll need to find some solution for that.
> 
> Another thing to work on beyond testsuite coverage (it is desirable to test
> nowait target tasks (both depend and without depend) being awaited in all
> the various waiting spots, i.e. end of parallel, barrier, taskwait, end of
> taskgroup, or if (0) task with depend clause waiting on that.
> 
> Also, I wonder what to do if #pragma omp target nowait is used outside of
> (host) parallel - when team is NULL.  All the tasking code in that case just
> executes tasks undeferred, which is fine for all but target nowait - there
> it is I'd say useful to be able to run a single host thread concurrently
> with some async offloading tasks.  So, I wonder if in that case,
> if we encounter target nowait with team == NULL, should not just create a
> dummy non-active (nthreads == 1) team, as if there was #pragma omp parallel
> if (0) starting above it and ending at program's end.  In OpenMP, the
> program's initial thread is implicitly surrounded by inactive parallel, so
> this isn't anything against the OpenMP execution model.  But we'd need to
> free the team somewhere in a destructor.
>
> Can you please try to cleanup the liboffloadmic side of this, so that
> a callback instead of hardcoded __gomp_offload_intelmic_async_completed call
> is used?

Do you mean something like the patch bellow?  I'll discuss it with liboffloadmic
maintainers.

> Can you make sure it works on XeonPhi non-emulated too?

I'm trying to do it, but it will take some time...

Unfortunately, target-32.c fails for me using emulation mode:

Program received signal SIGSEGV, Segmentation fault.
#0  0x7ff4ab1265ed in priority_list_remove (list=0x0, node=0x7ff49001afa0, 
model=MEMMODEL_RELAXED) at libgomp/priority_queue.h:422
#1  0x7ff4ab1266d9 in priority_tree_remove (type=PQ_CHILDREN, 
head=0x1883138, node=0x7ff49001afa0) at libgomp/priority_queue.c:195
#2  0x7ff4ab10fa06 in priority_queue_remove (type=PQ_CHILDREN, 
head=0x1883138, task=0x7ff49001af30, model=MEMMODEL_RELAXED) at 
libgomp/priority_queue.h:468
#3  0x7ff4ab11570d in gomp_task_maybe_wait_for_dependencies 
(depend=0x7ff49b0d9de0) at libgomp/task.c:1539
#4  0x7ff4ab11fd46 in GOMP_target_enter_exit_data (device=-1, mapnum=3, 
hostaddrs=0x7ff49b0d9dc0, sizes=0x6020b0 <.omp_data_sizes.38>, kinds=0x6020a0 
<.omp_data_kinds.39>, flags=2, depend=0x7ff49b0d9de0) at libgomp/target.c:1662
#5  0x004011f9 in main._omp_fn ()
#6  0x7ff4ab1160f3 in gomp_thread_start (xdata=0x7fffe93766a0) at 
libgomp/team.c:119
#7  0x003b07e07ee5 in start_thread () from /lib64/libpthread.so.0
#8  0x003b076f4b8d in clone () from /lib64/libc.so.6

However when I manually run commands from testsuite/libgomp.log under the same
environment, it passes.  Don't know where is the difference.

Also I tried to replace 'b = 4;' and 'b = 5;' with infinite loops, but got only
100% CPU usage in offload_target_main instead of 200%, so it seems that only one
target task is running concurrently.


diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp 
b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index 6da09b1..772e198 100644
--- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -220,6 +220,10 @@ static void
 register_main_image ()
 {
   __offload_register_image (_target_image);
+
+  /* liboffloadmic will call GOMP_PLUGIN_target_task_completion when
+ asynchronous task on target is completed.  */
+  __offload_register_task_callback (GOMP_PLUGIN_target_task_completion);
 }
 
 /* liboffloadmic loads and runs offload_target_main on all available devices
@@ -537

Re: [ptx] partitioning optimization

2015-11-10 Thread Ilya Verbin
> I've been unable to introduce a testcase for this. The difficulty is we want
> to check an rtl dump from the acceleration compiler, and there doesn't
> appear to be existing machinery for that in the testsuite.  Perhaps
> something to be added later?

I haven't tried it, but doesn't
/* { dg-options "-foffload=-fdump-rtl-..." } */
with
/* { dg-final { scan-rtl-dump ... } } */
work?

  -- Ilya


Re: [gomp4.1] Handle new form of #pragma omp declare target

2015-11-02 Thread Ilya Verbin
On Fri, Oct 30, 2015 at 20:12:25 +0100, Jakub Jelinek wrote:
> On Fri, Oct 30, 2015 at 08:44:07PM +0300, Ilya Verbin wrote:
> > On Wed, Oct 28, 2015 at 00:11:03 +0300, Ilya Verbin wrote:
> > > On Fri, Jul 17, 2015 at 15:05:59 +0200, Jakub Jelinek wrote:
> > > > As the testcases show, #pragma omp declare target has now a new form 
> > > > (well,
> > > > two; with some issues on it pending), where it is used just as a single
> > > > declarative directive rather than a pair of them and allows marking
> > > > vars and functions by name as "omp declare target" vars/functions 
> > > > (which the
> > > > middle-end etc. already handles),
> > > 
> > > There is an issue - such variables are not added to the offloading tables,
> > > because when varpool_node::get_create is called for the first time, the 
> > > variable
> > > doesn't yet have "omp declare target" attribute, and when it's called for 
> > > the
> > > second time, it just returns existing node.  Functions also aren't marked 
> > > as
> > > offloadable.  I tried to fix this by moving the code from
> > > varpool_node::get_create to varpool_node::finalize_decl, but it helped 
> > > only C,
> > > but doesn't fix C++.  Therefore, I decided to iterate through all 
> > > functions and
> > > variables, like in the patch bellow.  But it doesn't work for static vars,
> > > declared inside functions, because they do not appear in symtab :(
> > 
> > Ping?  Where should I set node->offloadable for "omp declare target to 
> > (list)"
> > functions, global and static vars?
> 
> Perhaps already somewhere in the FEs?  I mean, when the varpool node is
> created after the decl has that attribute, it already should set offsetable
> itself, so perhaps when adding the attribute check if corresponding varpool
> node exists already (but don't create it) and if yes, set offloadable?

Here is the patch.
make check RUNTESTFLAGS=gomp.exp and check-target-libgomp passed.
OK for gomp-4_5-branch?


gcc/c/
* c-parser.c: Include context.h.
(c_parser_omp_declare_target): If decl has "omp declare target" or
"omp declare target link" attribute, and cgraph or varpool node already
exists, then set corresponding flags.
gcc/cp/
* parser.c: Include context.h.
(cp_parser_omp_declare_target): If decl has "omp declare target" or
"omp declare target link" attribute, and cgraph or varpool node already
exists, then set corresponding flags.
libgomp/
* testsuite/libgomp.c++/target-13.C: Add global variable with "omp
declare target ()" directive, use it in foo.
* testsuite/libgomp.c/target-28.c: Likewise.


diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index a169457..049417c 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -67,6 +67,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gomp-constants.h"
 #include "c-family/c-indentation.h"
 #include "gimple-expr.h"
+#include "context.h"
 
 
 /* Initialization routine for this file.  */
@@ -15600,7 +15601,22 @@ c_parser_omp_declare_target (c_parser *parser)
  continue;
}
   if (!at1)
-   DECL_ATTRIBUTES (t) = tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (t));
+   {
+ symtab_node *node = symtab_node::get (t);
+ DECL_ATTRIBUTES (t) = tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (t));
+ if (node != NULL)
+   {
+ node->offloadable = 1;
+#ifdef ENABLE_OFFLOADING
+ g->have_offload = true;
+ if (is_a  (node))
+   {
+ vec_safe_push (offload_vars, t);
+ node->force_output = 1;
+   }
+#endif
+   }
+   }
 }
 }
 
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index a374e6c..de77a4b 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -49,6 +49,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "omp-low.h"
 #include "gomp-constants.h"
 #include "c-family/c-indentation.h"
+#include "context.h"
 
 
 /* The lexer.  */
@@ -34773,7 +34774,22 @@ cp_parser_omp_declare_target (cp_parser *parser, 
cp_token *pragma_tok)
  continue;
}
   if (!at1)
-   DECL_ATTRIBUTES (t) = tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (t));
+   {
+ symtab_node *node = symtab_node::get (t);
+ DECL_ATTRIBUTES (t) = tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (t));
+ if (node != NULL)
+   {
+ node->offloadable = 1;
+#ifdef ENABLE_OFFLOADING
+ g->have_offload = true;
+ if 

Re: [gomp4.1] Handle new form of #pragma omp declare target

2015-10-30 Thread Ilya Verbin
On Wed, Oct 28, 2015 at 00:11:03 +0300, Ilya Verbin wrote:
> On Fri, Jul 17, 2015 at 15:05:59 +0200, Jakub Jelinek wrote:
> > As the testcases show, #pragma omp declare target has now a new form (well,
> > two; with some issues on it pending), where it is used just as a single
> > declarative directive rather than a pair of them and allows marking
> > vars and functions by name as "omp declare target" vars/functions (which the
> > middle-end etc. already handles),
> 
> There is an issue - such variables are not added to the offloading tables,
> because when varpool_node::get_create is called for the first time, the 
> variable
> doesn't yet have "omp declare target" attribute, and when it's called for the
> second time, it just returns existing node.  Functions also aren't marked as
> offloadable.  I tried to fix this by moving the code from
> varpool_node::get_create to varpool_node::finalize_decl, but it helped only C,
> but doesn't fix C++.  Therefore, I decided to iterate through all functions 
> and
> variables, like in the patch bellow.  But it doesn't work for static vars,
> declared inside functions, because they do not appear in symtab :(

Ping?  Where should I set node->offloadable for "omp declare target to (list)"
functions, global and static vars?

Thanks,
  -- Ilya


Re: [gomp4.1] Handle new form of #pragma omp declare target

2015-10-27 Thread Ilya Verbin
On Fri, Jul 17, 2015 at 15:05:59 +0200, Jakub Jelinek wrote:
> As the testcases show, #pragma omp declare target has now a new form (well,
> two; with some issues on it pending), where it is used just as a single
> declarative directive rather than a pair of them and allows marking
> vars and functions by name as "omp declare target" vars/functions (which the
> middle-end etc. already handles),

There is an issue - such variables are not added to the offloading tables,
because when varpool_node::get_create is called for the first time, the variable
doesn't yet have "omp declare target" attribute, and when it's called for the
second time, it just returns existing node.  Functions also aren't marked as
offloadable.  I tried to fix this by moving the code from
varpool_node::get_create to varpool_node::finalize_decl, but it helped only C,
but doesn't fix C++.  Therefore, I decided to iterate through all functions and
variables, like in the patch bellow.  But it doesn't work for static vars,
declared inside functions, because they do not appear in symtab :(


diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 1a64d789..0ba04ef 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -511,16 +511,6 @@ cgraph_node::create (tree decl)
   gcc_assert (TREE_CODE (decl) == FUNCTION_DECL);
 
   node->decl = decl;
-
-  if ((flag_openacc || flag_openmp)
-  && lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl)))
-{
-  node->offloadable = 1;
-#ifdef ENABLE_OFFLOADING
-  g->have_offload = true;
-#endif
-}
-
   node->register_symbol ();
 
   if (DECL_CONTEXT (decl) && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL)
diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 04a4d3f..9ac7b36 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -1016,6 +1016,25 @@ analyze_functions (bool first_time)
   symtab->state = CONSTRUCTION;
   input_location = UNKNOWN_LOCATION;
 
+  /* Process offloadable functions and variables.  */
+  if (first_time && (flag_openacc || flag_openmp))
+FOR_EACH_SYMBOL (node)
+  if (lookup_attribute ("omp declare target", DECL_ATTRIBUTES 
(node->decl)))
+   {
+ node->offloadable = 1;
+
+#ifdef ENABLE_OFFLOADING
+ g->have_offload = true;
+
+ if (TREE_CODE (node->decl) == VAR_DECL && !DECL_EXTERNAL (node->decl))
+   {
+ if (!in_lto_p)
+   vec_safe_push (offload_vars, node->decl);
+ node->force_output = 1;
+   }
+#endif
+   }
+
   /* Ugly, but the fixup can not happen at a time same body alias is created;
  C++ FE is confused about the COMDAT groups being right.  */
   if (symtab->cpp_implicit_aliases_done)
diff --git a/gcc/varpool.c b/gcc/varpool.c
index 7d11e20..077dd40 100644
--- a/gcc/varpool.c
+++ b/gcc/varpool.c
@@ -154,19 +154,6 @@ varpool_node::get_create (tree decl)
 
   node = varpool_node::create_empty ();
   node->decl = decl;
-
-  if ((flag_openacc || flag_openmp) && !DECL_EXTERNAL (decl)
-  && lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl)))
-{
-  node->offloadable = 1;
-#ifdef ENABLE_OFFLOADING
-  g->have_offload = true;
-  if (!in_lto_p)
-   vec_safe_push (offload_vars, decl);
-  node->force_output = 1;
-#endif
-}
-
   node->register_symbol ();
   return node;
 }
diff --git a/libgomp/testsuite/libgomp.c++/target-13.C 
b/libgomp/testsuite/libgomp.c++/target-13.C
index 376672d..5279ac0 100644
--- a/libgomp/testsuite/libgomp.c++/target-13.C
+++ b/libgomp/testsuite/libgomp.c++/target-13.C
@@ -1,11 +1,14 @@
 extern "C" void abort (void);
 
+int g;
+#pragma omp declare target (g)
+
 #pragma omp declare target
 int
 foo (void)
 {
   static int s;
-  return ++s;
+  return ++s + g;
 }
 #pragma omp end declare target
 
diff --git a/libgomp/testsuite/libgomp.c/target-28.c 
b/libgomp/testsuite/libgomp.c/target-28.c
index c9a2999..96e9e05 100644
--- a/libgomp/testsuite/libgomp.c/target-28.c
+++ b/libgomp/testsuite/libgomp.c/target-28.c
@@ -1,11 +1,14 @@
 extern void abort (void);
 
+int g;
+#pragma omp declare target (g)
+
 #pragma omp declare target
 int
 foo (void)
 {
   static int s;
-  return ++s;
+  return ++s + g;
 }
 #pragma omp end declare target
 
 
  -- Ilya


Re: [gomp4.1] Handle new form of #pragma omp declare target

2015-10-26 Thread Ilya Verbin
On Mon, Oct 26, 2015 at 20:05:39 +0100, Jakub Jelinek wrote:
> On Mon, Oct 26, 2015 at 09:35:52PM +0300, Ilya Verbin wrote:
> > On Fri, Jul 17, 2015 at 15:05:59 +0200, Jakub Jelinek wrote:
> > > As the testcases show, #pragma omp declare target has now a new form 
> > > (well,
> > > two; with some issues on it pending), where it is used just as a single
> > > declarative directive rather than a pair of them and allows marking
> > > vars and functions by name as "omp declare target" vars/functions (which 
> > > the
> > > middle-end etc. already handles), but also "omp declare target link", 
> > > which
> > > is a deferred var, that is not initially mapped (on devices without shared
> > > memory with host), but has to be mapped explicitly.
> > 
> > I don't quite understand how link should work.  OpenMP 4.5 says:
> > 
> > "The list items of a link clause are not mapped by the declare target 
> > directive.
> > Instead, their mapping is deferred until they are mapped by target data or
> > target constructs. They are mapped only for such regions."
> >
> > But doesn't this mean that the example bellow should work identically
> > with/without USE_LINK defined?  Or is there some difference on other 
> > testcases?
> 
> On your testcase, the end result is pretty much the same, the variable is
> not mapped initially to the device, and at the beginning of omp target it is
> mapped to device, at the end of the region it is unmapped from the device
> (without copying back).
> 
> But consider:
> 
> int a = 1, b = 1;
> #pragma omp declare target link (a) to (b)
> int
> foo (void)
> {
>   return a++ + b++;
> }
> #pragma omp declare target to (foo)
> int
> main ()
> {
>   a = 2;
>   b = 2;
>   int res;
>   #pragma omp target map (to: a, b) map (from: res)
>   {
> res = foo () + foo ();
>   }
>   // This assumes only non-shared address space, so would need to be guarded
>   // for that.
>   if (res != (2 + 1) + (3 + 2))
> __builtin_abort ();
>   return 0;
> }
> 
> Without declare target link or to, you can't use the global variables
> in orphaned accelerated routines (unless you e.g. take the address of the
> mapped variable in the region and pass it around).
> The to variables (non-deferred) are always mapped and are initialized with
> the original initializer, refcount is infinity.  link (deferred) work more
> like the normal mapping, referencing those vars when they aren't explicitly
> (or implicitly) mapped is unspecified behavior, if it is e.g. mapped freshly
> with to kind, it gets the current value of the host var rather than the
> original one.  But, beyond the mapping the compiler needs to ensure that
> all uses of the link global var (or perhaps just all uses of the link global
> var outside of the target construct body where it is mapped, because you
> could use there the pointer you got from GOMP_target) are replaced by
> dereference of some artificial pointer, so a becomes *a_tmp and  becomes
> &*a_tmp, and that the runtime library during registration of the tables is
> told about the address of this artificial pointer.  During registration,
> I'd expect it would stick an entry for this range into the table, with some
> special flag or something similar, indicating that it is deferred mapping
> and where the offloading device pointer is.  During mapping, it would map it
> as any other not yet mapped object, but additionally would also set this
> device pointer to the device address of the mapped object.  We also need to
> ensure that when we drop the refcount of that mapping back to 0, we get it
> back to the state where it is described as a range with registered deferred
> mapping and where the device pointer is.

Ok, got it, I'll try implement this...

> > > we actually replace the variables with pointers to variables, then need
> > > to somehow also mark those in the offloading tables, so that the library
> > 
> > I see 2 possible options: use the MSB of the size, or introduce the third 
> > field
> > for flags.
> 
> Well, it can be either recorded in the host variable tables (which contain
> address and size pair, right), or in corresponding offloading device table
> (which contains the pointer, something else?).

It contains a size too, which is checked in libgomp:
  gomp_fatal ("Can't map target variables (size mismatch)");
Yes, we can remove this check, and use second field in device table for flags.

  -- Ilya


Re: [gomp4.1] map clause parsing improvements

2015-10-26 Thread Ilya Verbin
On Mon, Oct 26, 2015 at 14:07:13 +0100, Jakub Jelinek wrote:
> On Mon, Oct 26, 2015 at 03:53:57PM +0300, Ilya Verbin wrote:
> > @@ -7363,7 +7363,7 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, tree 
> > *list_p,
> >   n = splay_tree_lookup (ctx->variables, (splay_tree_key) decl);
> >   if ((ctx->region_type & ORT_TARGET) != 0
> >   && !(n->value & GOVD_SEEN)
> > - && ((OMP_CLAUSE_MAP_KIND (c) & GOMP_MAP_FLAG_ALWAYS) == 0
> > + && (GOMP_MAP_ALWAYS_P (OMP_CLAUSE_MAP_KIND (c)) == 0
> >   || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_STRUCT))
> 
> The || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_STRUCT part can go then too,
> it was there only because (OMP_CLAUSE_MAP_KIND (c) & GOMP_MAP_FLAG_ALWAYS)
> has been non-zero for GOMP_MAP_STRUCT (and the () pair around the condition
> too).

Oops, missed that.

> We want to be able to remove all map clauses on the target construct, except
> if it is always {to,from,tofrom}.
> We do not want to remove release or delete, but those only exist on target
> exit data and thus are handled by (ctx->region_type & ORT_TARGET) != 0.
> 
> > @@ -142,6 +143,10 @@ enum gomp_map_kind
> >  #define GOMP_MAP_ALWAYS_FROM_P(X) \
> >(((X) == GOMP_MAP_ALWAYS_FROM) || ((X) == GOMP_MAP_ALWAYS_TOFROM))
> >  
> > +#define GOMP_MAP_ALWAYS_P(X) \
> > +  (((X) == GOMP_MAP_ALWAYS_TO) || ((X) == GOMP_MAP_ALWAYS_FROM) \
> > +   || ((X) == GOMP_MAP_ALWAYS_TOFROM))
> 
> You could simplify this e.g. to
>   (((X) == GOMP_MAP_ALWAYS_TO) || GOMP_MAP_ALWAYS_FROM_P (X))
> or
>   (GOMP_MAP_ALWAYS_TO_P (X) || ((X) == GOMP_MAP_ALWAYS_FROM))
> 
> Otherwise, LGTM.

Done.  Here is what I committed:


gcc/
* gimplify.c (gimplify_scan_omp_clauses): Use GOMP_MAP_ALWAYS_P.
(gimplify_adjust_omp_clauses): Likewise.
include/
* gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_2): Define.
(GOMP_MAP_FLAG_ALWAYS): Remove.
(enum gomp_map_kind): Use GOMP_MAP_FLAG_SPECIAL_2 instead of
GOMP_MAP_FLAG_ALWAYS for GOMP_MAP_ALWAYS_TO, GOMP_MAP_ALWAYS_FROM,
GOMP_MAP_ALWAYS_TOFROM, GOMP_MAP_STRUCT, GOMP_MAP_RELEASE.
(GOMP_MAP_ALWAYS_P): Define.


diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index ee5cb95..a308307 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -6613,7 +6613,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
  struct_map_to_clause->put (decl, *list_p);
  list_p = _CLAUSE_CHAIN (*list_p);
  flags = GOVD_MAP | GOVD_EXPLICIT;
- if (OMP_CLAUSE_MAP_KIND (c) & GOMP_MAP_FLAG_ALWAYS)
+ if (GOMP_MAP_ALWAYS_P (OMP_CLAUSE_MAP_KIND (c)))
flags |= GOVD_SEEN;
  goto do_add_decl;
}
@@ -6623,7 +6623,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
  tree *sc = NULL, *pt = NULL;
  if (!ptr && TREE_CODE (*osc) == TREE_LIST)
osc = _PURPOSE (*osc);
- if (OMP_CLAUSE_MAP_KIND (c) & GOMP_MAP_FLAG_ALWAYS)
+ if (GOMP_MAP_ALWAYS_P (OMP_CLAUSE_MAP_KIND (c)))
n->value |= GOVD_SEEN;
  offset_int o1, o2;
  if (offset)
@@ -7363,8 +7363,7 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, tree 
*list_p,
  n = splay_tree_lookup (ctx->variables, (splay_tree_key) decl);
  if ((ctx->region_type & ORT_TARGET) != 0
  && !(n->value & GOVD_SEEN)
- && ((OMP_CLAUSE_MAP_KIND (c) & GOMP_MAP_FLAG_ALWAYS) == 0
- || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_STRUCT))
+ && GOMP_MAP_ALWAYS_P (OMP_CLAUSE_MAP_KIND (c)) == 0)
{
  remove = true;
  /* For struct element mapping, if struct is never referenced
diff --git a/include/gomp-constants.h b/include/gomp-constants.h
index f834dec..008a4a4 100644
--- a/include/gomp-constants.h
+++ b/include/gomp-constants.h
@@ -39,10 +39,9 @@
 /* Special map kinds, enumerated starting here.  */
 #define GOMP_MAP_FLAG_SPECIAL_0(1 << 2)
 #define GOMP_MAP_FLAG_SPECIAL_1(1 << 3)
+#define GOMP_MAP_FLAG_SPECIAL_2(1 << 4)
 #define GOMP_MAP_FLAG_SPECIAL  (GOMP_MAP_FLAG_SPECIAL_1 \
 | GOMP_MAP_FLAG_SPECIAL_0)
-/* OpenMP always flag.  */
-#define GOMP_MAP_FLAG_ALWAYS   (1 << 6)
 /* Flag to force a specific behavior (or else, trigger a run-time error).  */
 #define GOMP_MAP_FLAG_FORCE(1 << 7)
 
@@ -95,29 +94,31 @@ enum gomp_map_kind
 GOMP_MAP_FORCE_TOFROM =(GOMP_MAP_FLAG_F

Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC

2015-10-26 Thread Ilya Verbin
On Fri, Oct 23, 2015 at 10:10:06 +0200, Jakub Jelinek wrote:
> On Thu, Oct 22, 2015 at 09:26:37PM +0300, Ilya Verbin wrote:
> > On Mon, Dec 22, 2014 at 13:01:40 +0100, Thomas Schwinge wrote:
> > > By chance (when tracking down a different problem), I've found the
> > > following.  Would you please check whether that's a real problem in
> > > liboffloadmic, or its libgomp plugin, or just a mis-diagnosis by
> > > Valgrind?
> > > 
> > > ==21327== Syscall param write(buf) points to uninitialised byte(s)
> > 
> > Finally we have investigated this :)  Valgrind warns about uninitialized 
> > bytes,
> > inserted into the struct for alignment.  It's possible to avoid the warning 
> > by
> > the patch bellow.  Should I commit it, or just leave it as is?
> 
> Or use calloc instead of malloc, or add two uint8_t padding fields after the
> two uint8_t fields and initialize them too.  Though, as you have some
> padding after the name, I think calloc is best.

Here is what I committed to trunk together with an obvious change.


liboffloadmic/
* runtime/offload_host.cpp (OffloadDescriptor::setup_misc_data): Use
calloc instead of malloc.
(__offload_fini_library): Set mic_engines_total to zero.


diff --git a/liboffloadmic/runtime/offload_host.cpp 
b/liboffloadmic/runtime/offload_host.cpp
index c6c6518..a150410 100644
--- a/liboffloadmic/runtime/offload_host.cpp
+++ b/liboffloadmic/runtime/offload_host.cpp
@@ -2424,8 +2424,8 @@ bool OffloadDescriptor::setup_misc_data(const char *name)
 }
 
 // initialize function descriptor
-m_func_desc = (FunctionDescriptor*) malloc(m_func_desc_size +
-   misc_data_size);
+m_func_desc = (FunctionDescriptor*) calloc(1, m_func_desc_size
+ + misc_data_size);
 if (m_func_desc == NULL)
   LIBOFFLOAD_ERROR(c_malloc);
 m_func_desc->console_enabled = console_enabled;
@@ -5090,6 +5090,7 @@ static void __offload_fini_library(void)
 OFFLOAD_DEBUG_TRACE(2, "Cleanup offload library ...\n");
 if (mic_engines_total > 0) {
 delete[] mic_engines;
+mic_engines_total = 0;
 
 if (mic_proxy_fs_root != 0) {
 free(mic_proxy_fs_root);


  -- Ilya


Re: [gomp4.1] map clause parsing improvements

2015-10-26 Thread Ilya Verbin
On Tue, Oct 20, 2015 at 12:03:40 +0200, Jakub Jelinek wrote:
> On Mon, Oct 19, 2015 at 05:00:33PM +0200, Thomas Schwinge wrote:
> >   n = splay_tree_lookup (ctx->variables, (splay_tree_key) decl);
> >   if ((ctx->region_type & ORT_TARGET) != 0
> >   && !(n->value & GOVD_SEEN)
> >   && ((OMP_CLAUSE_MAP_KIND (c) & GOMP_MAP_FLAG_ALWAYS) == 0
> >   || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_STRUCT))
> > {
> >   remove = true;
> > 
> > I'd suggest turning GOMP_MAP_FLAG_ALWAYS into GOMP_MAP_FLAG_SPECIAL_2,
> > and then provide a GOMP_MAP_ALWAYS_P that evaluates to true just for the
> > three "always,to", "always,from", and "always,tofrom" cases.
> 
> Yeah, that can be done, I'll add it to my todo list.

Is this what you planned?  I've replaced all 3 uses of GOMP_MAP_FLAG_ALWAYS with
GOMP_MAP_ALWAYS_P.  make check and check-target-libgomp passed, however these 2
changes in gimplify_scan_omp_clauses are not covered by the testsuite, so I'm
not entirely sure that they are correct.  OK for gomp-4_5-branch?


gcc/
* gimplify.c (gimplify_scan_omp_clauses): Use GOMP_MAP_ALWAYS_P.
(gimplify_adjust_omp_clauses): Likewise.
include/
* gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_2): Define.
(GOMP_MAP_FLAG_ALWAYS): Remove.
(enum gomp_map_kind): Use GOMP_MAP_FLAG_SPECIAL_2 instead of
GOMP_MAP_FLAG_ALWAYS for GOMP_MAP_ALWAYS_TO, GOMP_MAP_ALWAYS_FROM,
GOMP_MAP_ALWAYS_TOFROM, GOMP_MAP_STRUCT, GOMP_MAP_RELEASE.
(GOMP_MAP_ALWAYS_P): Define.


diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index ee5cb95..57ab6c6 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -6613,7 +6613,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
  struct_map_to_clause->put (decl, *list_p);
  list_p = _CLAUSE_CHAIN (*list_p);
  flags = GOVD_MAP | GOVD_EXPLICIT;
- if (OMP_CLAUSE_MAP_KIND (c) & GOMP_MAP_FLAG_ALWAYS)
+ if (GOMP_MAP_ALWAYS_P (OMP_CLAUSE_MAP_KIND (c)))
flags |= GOVD_SEEN;
  goto do_add_decl;
}
@@ -6623,7 +6623,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
  tree *sc = NULL, *pt = NULL;
  if (!ptr && TREE_CODE (*osc) == TREE_LIST)
osc = _PURPOSE (*osc);
- if (OMP_CLAUSE_MAP_KIND (c) & GOMP_MAP_FLAG_ALWAYS)
+ if (GOMP_MAP_ALWAYS_P (OMP_CLAUSE_MAP_KIND (c)))
n->value |= GOVD_SEEN;
  offset_int o1, o2;
  if (offset)
@@ -7363,7 +7363,7 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, tree 
*list_p,
  n = splay_tree_lookup (ctx->variables, (splay_tree_key) decl);
  if ((ctx->region_type & ORT_TARGET) != 0
  && !(n->value & GOVD_SEEN)
- && ((OMP_CLAUSE_MAP_KIND (c) & GOMP_MAP_FLAG_ALWAYS) == 0
+ && (GOMP_MAP_ALWAYS_P (OMP_CLAUSE_MAP_KIND (c)) == 0
  || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_STRUCT))
{
  remove = true;
diff --git a/include/gomp-constants.h b/include/gomp-constants.h
index f834dec..2c6f011 100644
--- a/include/gomp-constants.h
+++ b/include/gomp-constants.h
@@ -39,10 +39,9 @@
 /* Special map kinds, enumerated starting here.  */
 #define GOMP_MAP_FLAG_SPECIAL_0(1 << 2)
 #define GOMP_MAP_FLAG_SPECIAL_1(1 << 3)
+#define GOMP_MAP_FLAG_SPECIAL_2(1 << 4)
 #define GOMP_MAP_FLAG_SPECIAL  (GOMP_MAP_FLAG_SPECIAL_1 \
 | GOMP_MAP_FLAG_SPECIAL_0)
-/* OpenMP always flag.  */
-#define GOMP_MAP_FLAG_ALWAYS   (1 << 6)
 /* Flag to force a specific behavior (or else, trigger a run-time error).  */
 #define GOMP_MAP_FLAG_FORCE(1 << 7)
 
@@ -95,29 +94,31 @@ enum gomp_map_kind
 GOMP_MAP_FORCE_TOFROM =(GOMP_MAP_FLAG_FORCE | GOMP_MAP_TOFROM),
 /* If not already present, allocate.  And unconditionally copy to
device.  */
-GOMP_MAP_ALWAYS_TO =   (GOMP_MAP_FLAG_ALWAYS | GOMP_MAP_TO),
+GOMP_MAP_ALWAYS_TO =   (GOMP_MAP_FLAG_SPECIAL_2 | GOMP_MAP_TO),
 /* If not already present, allocate.  And unconditionally copy from
device.  */
-GOMP_MAP_ALWAYS_FROM = (GOMP_MAP_FLAG_ALWAYS | GOMP_MAP_FROM),
+GOMP_MAP_ALWAYS_FROM = (GOMP_MAP_FLAG_SPECIAL_2
+| GOMP_MAP_FROM),
 /* If not already present, allocate.  And unconditionally copy to and from
device.  */
-GOMP_MAP_ALWAYS_TOFROM =   (GOMP_MAP_FLAG_ALWAYS | 
GOMP_MAP_TOFROM),
+GOMP_MAP_ALWAYS_TOFROM =   (GOMP_MAP_FLAG_SPECIAL_2
+| GOMP_MAP_TOFROM),
 /* Map a sparse struct; the 

Re: [gomp4.1] Handle new form of #pragma omp declare target

2015-10-26 Thread Ilya Verbin
On Fri, Jul 17, 2015 at 15:05:59 +0200, Jakub Jelinek wrote:
> As the testcases show, #pragma omp declare target has now a new form (well,
> two; with some issues on it pending), where it is used just as a single
> declarative directive rather than a pair of them and allows marking
> vars and functions by name as "omp declare target" vars/functions (which the
> middle-end etc. already handles), but also "omp declare target link", which
> is a deferred var, that is not initially mapped (on devices without shared
> memory with host), but has to be mapped explicitly.

I don't quite understand how link should work.  OpenMP 4.5 says:

"The list items of a link clause are not mapped by the declare target directive.
Instead, their mapping is deferred until they are mapped by target data or
target constructs. They are mapped only for such regions."

But doesn't this mean that the example bellow should work identically
with/without USE_LINK defined?  Or is there some difference on other testcases?

int a = 1;

#ifdef USE_LINK
#pragma omp declare target link(a)
#endif

int main ()
{
  a = 2;
  int res;
  #pragma omp target map(to: a) map(from: res)
res = a;
  return res;
}

> This patch only marks them with the new attribute, the actual middle-end
> implementation needs to be implemented.
> 
> I believe OpenACC has something similar, but no idea if it is already
> implemented.
> 
> Anyway, I think the implementation should be that in some pass running on
> the ACCEL_COMPILER side (guarded by separate address space aka non-HSA)

HSA does not define ACCEL_COMPILER, because it uses only one compiler.

> we actually replace the variables with pointers to variables, then need
> to somehow also mark those in the offloading tables, so that the library

I see 2 possible options: use the MSB of the size, or introduce the third field
for flags.

> registers them (the locations of the pointers to the vars), but also marks
> them for special treatment, and then when actually trying to map them
> (or their parts, guess that needs to be discussed) we allocate them or
> whatever is requested and store the device pointer into the corresponding
> variable.
> 
> Ilya, Thomas, thoughts on this?

  -- Ilya


Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC

2015-10-22 Thread Ilya Verbin
On Mon, Dec 22, 2014 at 13:01:40 +0100, Thomas Schwinge wrote:
> By chance (when tracking down a different problem), I've found the
> following.  Would you please check whether that's a real problem in
> liboffloadmic, or its libgomp plugin, or just a mis-diagnosis by
> Valgrind?
> 
> ==21327== Syscall param write(buf) points to uninitialised byte(s)

Finally we have investigated this :)  Valgrind warns about uninitialized bytes,
inserted into the struct for alignment.  It's possible to avoid the warning by
the patch bellow.  Should I commit it, or just leave it as is?


diff --git a/liboffloadmic/runtime/offload_host.cpp 
b/liboffloadmic/runtime/offload_host.cpp
index d04233f..66c2a01 100644
--- a/liboffloadmic/runtime/offload_host.cpp
+++ b/liboffloadmic/runtime/offload_host.cpp
@@ -2425,6 +2425,7 @@ bool OffloadDescriptor::setup_misc_data(const char *name)
misc_data_size);
 if (m_func_desc == NULL)
   LIBOFFLOAD_ERROR(c_malloc);
+   memset (m_func_desc, 0, m_func_desc_size + misc_data_size);
 m_func_desc->console_enabled = console_enabled;
 m_func_desc->timer_enabled = offload_report_enabled &&
 (timer_enabled || offload_report_level);


  -- Ilya


Re: Constify host-side offload data`

2015-10-22 Thread Ilya Verbin
On Wed, Oct 21, 2015 at 10:44:56 -0700, H.J. Lu wrote:
> On Wed, Oct 21, 2015 at 10:42 AM, Ilya Verbin <iver...@gmail.com> wrote:
> > On Wed, Oct 21, 2015 at 10:38:10 -0700, H.J. Lu wrote:
> >> On Wed, Oct 21, 2015 at 10:33 AM, Ilya Verbin <iver...@gmail.com> wrote:
> >> > H.J.,
> >> > Maybe linker should print some warning about joining writable + 
> >> > nonwritable
> >> > sections?  Here is a simple testcase:
> >> >
> >> > $ cat t1.s
> >> > .section ".AAA", "a"
> >> > .long 0x12345678
> >> > $ cat t2.s
> >> > .section ".AAA", "wa"
> >> > .long 0x12345678
> >> > $ as t1.s -o t1.o
> >> > $ as t2.s -o t2.o
> >> > $ ld -shared t1.o t2.o
> >> > $ ls -lh a.out
> >> > 2.1M a.out
> >> >
> >>
> >> Does linker make AAA  writable? If yes, linker does what it
> >> is told.
> >
> > Yes, it makes it writable, but why it also makes this?
> >
> >   [Nr] Name  Type Address   Offset
> >Size  EntSize  Flags  Link  Info  Align
> >   [ 0]   NULL   
> >     0 0 0
> >   [ 1] .hash HASH 00b0  00b0
> >0028  0004   A   2 0 8
> >   [ 2] .dynsym   DYNSYM   00d8  00d8
> >0078  0018   A   3 2 8
> >   [ 3] .dynstr   STRTAB   0150  0150
> >0019     A   0 0 1
> >   [ 4] .AAA  PROGBITS 0169  0169
> >0008    WA   0 0 1
> >   [ 5] .eh_frame PROGBITS 0178  0178
> >     A   0 0 8
> >   [ 6] .dynamic  DYNAMIC  00200178  00200178  <-- 
> > ???
> >00b0  0010  WA   3 0 8
> >   [ 7] .shstrtab STRTAB     00200380
> >0049     0 0 1
> >   [ 8] .symtab   SYMTAB     00200228
> >0120  0018   9 9 8
> >   [ 9] .strtab   STRTAB     00200348
> >0038     0 0 1
> >
> 
> Linker groups input sections by section name and ors section
> flags.

Could you please help figure out how this number 0x200178 is calculated?
ld -verbose doesn't show anything helpful.  It seems that something goes wrong
during section-to-segment mapping, because when both .AAA have "wa" flags, we
got small binary with 2 LOAD segments:
  Type   Offset VirtAddr   PhysAddr
 FileSizMemSiz  Flags  Align
  LOAD   0x 0x 0x
 0x01a8 0x01a8  R  20
  LOAD   0x01a8 0x002001a8 0x002001a8
 0x00b8 0x00b8  RW 20

But when one .AAA has "a" flag, and another .AAA has "wa" flag, we got huge
binary with only one big LOAD segment:
  Type   Offset VirtAddr   PhysAddr
 FileSizMemSiz  Flags  Align
  LOAD   0x 0x 0x
 0x00200228 0x00200228  RW 20

BTW, gold produces small binary in both cases.

Thanks,
  -- Ilya


Re: Constify host-side offload data`

2015-10-22 Thread Ilya Verbin
On Thu, Oct 22, 2015 at 07:35:55 -0700, H.J. Lu wrote:
> On Thu, Oct 22, 2015 at 7:11 AM, Ilya Verbin <iver...@gmail.com> wrote:
> > On Wed, Oct 21, 2015 at 10:44:56 -0700, H.J. Lu wrote:
> >> On Wed, Oct 21, 2015 at 10:42 AM, Ilya Verbin <iver...@gmail.com> wrote:
> >> > On Wed, Oct 21, 2015 at 10:38:10 -0700, H.J. Lu wrote:
> >> >> On Wed, Oct 21, 2015 at 10:33 AM, Ilya Verbin <iver...@gmail.com> wrote:
> >> >> > H.J.,
> >> >> > Maybe linker should print some warning about joining writable + 
> >> >> > nonwritable
> >> >> > sections?  Here is a simple testcase:
> >> >> >
> >> >> > $ cat t1.s
> >> >> > .section ".AAA", "a"
> >> >> > .long 0x12345678
> >> >> > $ cat t2.s
> >> >> > .section ".AAA", "wa"
> >> >> > .long 0x12345678
> >> >> > $ as t1.s -o t1.o
> >> >> > $ as t2.s -o t2.o
> >> >> > $ ld -shared t1.o t2.o
> >> >> > $ ls -lh a.out
> >> >> > 2.1M a.out
> >> >> >
> >> >>
> >> >> Does linker make AAA  writable? If yes, linker does what it
> >> >> is told.
> >> >
> >> > Yes, it makes it writable, but why it also makes this?
> >> >
> >> >   [Nr] Name  Type Address   Offset
> >> >Size  EntSize  Flags  Link  Info  Align
> >> >   [ 0]   NULL   
> >> >     0 0 0
> >> >   [ 1] .hash HASH 00b0  00b0
> >> >0028  0004   A   2 0 8
> >> >   [ 2] .dynsym   DYNSYM   00d8  00d8
> >> >0078  0018   A   3 2 8
> >> >   [ 3] .dynstr   STRTAB   0150  0150
> >> >0019     A   0 0 1
> >> >   [ 4] .AAA  PROGBITS 0169  0169
> >> >0008    WA   0 0 1
> >> >   [ 5] .eh_frame PROGBITS 0178  0178
> >> >     A   0 0 8
> >> >   [ 6] .dynamic  DYNAMIC  00200178  00200178  
> >> > <-- ???
> >> >00b0  0010  WA   3 0 8
> >> >   [ 7] .shstrtab STRTAB     00200380
> >> >0049     0 0 1
> >> >   [ 8] .symtab   SYMTAB     00200228
> >> >0120  0018   9 9 8
> >> >   [ 9] .strtab   STRTAB     00200348
> >> >0038     0 0 1
> >> >
> >>
> >> Linker groups input sections by section name and ors section
> >> flags.
> >
> > Could you please help figure out how this number 0x200178 is calculated?
> > ld -verbose doesn't show anything helpful.  It seems that something goes 
> > wrong
> > during section-to-segment mapping, because when both .AAA have "wa" flags, 
> > we
> > got small binary with 2 LOAD segments:
> >   Type   Offset VirtAddr   PhysAddr
> >  FileSizMemSiz  Flags  Align
> >   LOAD   0x 0x 0x
> >  0x01a8 0x01a8  R  20
> >   LOAD   0x01a8 0x002001a8 0x002001a8
> >  0x00b8 0x00b8  RW 20
> >
> > But when one .AAA has "a" flag, and another .AAA has "wa" flag, we got huge
> > binary with only one big LOAD segment:
> >   Type   Offset VirtAddr   PhysAddr
> >  FileSizMemSiz  Flags  Align
> >   LOAD   0x 0x 0x
> >  0x00200228 0x00200228  RW 20
> >
> > BTW, gold produces small binary in both cases.
> >
> 
> Please open a binutils bug with a testcase.

Done: https://sourceware.org/bugzilla/show_bug.cgi?id=19162

  -- Ilya


Re: Constify host-side offload data`

2015-10-21 Thread Ilya Verbin
Hi!

On Wed, Jul 15, 2015 at 20:56:50 -0400, Nathan Sidwell wrote:
> --- libgcc/offloadstuff.c (revision 225851)
> +++ libgcc/offloadstuff.c (working copy)
> ...
> -void *__offload_func_table[0]
> +const void *const __offload_func_table[0]
> ...
> -void *__offload_var_table[0]
> +const void *const __offload_var_table[0]

I've just noticed that this patch + similar change in intelmic-mkoffload.c
 bumps up the filesize
of "helloworld" with offloading to MIC from 17KB to 4MB!

This happens because .gnu.offload_{funcs,vars} sections in
crtoffload{begin,end}.o now doesn't have WRITE flag, but the same sections
produced by omp_finish_file has it.  When linker joins writable + nonwritable
sections from several objects, it inserts some weird 2MB offset into the final
binary.  I.e. now there are 2 such offsets: one in the host binary and one in
the MIC target image, hence 4MB.  I haven't investigated how it happens, because
I thing it's bad idea to join sections with different flags.

But we can't make .gnu.offload_{funcs,vars} in omp_finish_file also readonly,
because in case of shared libraries there are R_X86_64_RELATIVE relocations,
which make these sections writable.  So, I guess we need to remove all consts to
make these sections writable in all objects.

H.J.,
Maybe linker should print some warning about joining writable + nonwritable
sections?  Here is a simple testcase:

$ cat t1.s
.section ".AAA", "a"
.long 0x12345678
$ cat t2.s
.section ".AAA", "wa"
.long 0x12345678
$ as t1.s -o t1.o
$ as t2.s -o t2.o
$ ld -shared t1.o t2.o
$ ls -lh a.out
2.1M a.out

  -- Ilya


Re: Constify host-side offload data`

2015-10-21 Thread Ilya Verbin
On Wed, Oct 21, 2015 at 10:38:10 -0700, H.J. Lu wrote:
> On Wed, Oct 21, 2015 at 10:33 AM, Ilya Verbin <iver...@gmail.com> wrote:
> > H.J.,
> > Maybe linker should print some warning about joining writable + nonwritable
> > sections?  Here is a simple testcase:
> >
> > $ cat t1.s
> > .section ".AAA", "a"
> > .long 0x12345678
> > $ cat t2.s
> > .section ".AAA", "wa"
> > .long 0x12345678
> > $ as t1.s -o t1.o
> > $ as t2.s -o t2.o
> > $ ld -shared t1.o t2.o
> > $ ls -lh a.out
> > 2.1M a.out
> >
> 
> Does linker make AAA  writable? If yes, linker does what it
> is told.

Yes, it makes it writable, but why it also makes this?

  [Nr] Name  Type Address   Offset
   Size  EntSize  Flags  Link  Info  Align
  [ 0]   NULL   
        0 0 0
  [ 1] .hash HASH 00b0  00b0
   0028  0004   A   2 0 8
  [ 2] .dynsym   DYNSYM   00d8  00d8
   0078  0018   A   3 2 8
  [ 3] .dynstr   STRTAB   0150  0150
   0019     A   0 0 1
  [ 4] .AAA  PROGBITS 0169  0169
   0008    WA   0 0 1
  [ 5] .eh_frame PROGBITS 0178  0178
        A   0 0 8
  [ 6] .dynamic  DYNAMIC  00200178  00200178  <-- ???
   00b0  0010  WA   3 0 8
  [ 7] .shstrtab STRTAB     00200380
   0049     0 0 1
  [ 8] .symtab   SYMTAB     00200228
   0120  0018   9 9 8
  [ 9] .strtab   STRTAB     00200348
   0038     0 0 1

  -- Ilya


Re: [OpenACC 11/11] execution tests

2015-10-21 Thread Ilya Verbin


> On 21 Oct 2015, at 22:53, Nathan Sidwell  wrote:
> 
> This patch has some new execution tests, verifying loop partitioning is 
> behaving as expected.
> 
> There are more execution tests on the gomp4 branch, but many of them use 
> reductions.  We'll merge those once reductions are merged.
> 
> nathan
> <11-trunk-tests.patch>

Does the testcase with offload IR appear here accidentally?

  -- Ilya

Re: [gomp4] lto error message

2015-10-20 Thread Ilya Verbin
On Tue, Oct 20, 2015 at 15:54:45 -0400, Nathan Sidwell wrote:
> @@ -1209,16 +1209,11 @@ input_overwrite_node (struct lto_file_de
>  
>if (!success)
>  {
> -  if (flag_openacc)
> - {
> -   if (TREE_CODE (node->decl) == FUNCTION_DECL)
> - error ("Missing routine function %<%s%>", node->name ());
> -   else
> - error ("Missing declared variable %<%s%>", node->name ());
> - }
> -
> +  gcc_assert (flag_openacc);
> +  if (TREE_CODE (node->decl) == FUNCTION_DECL)
> + error ("missing OpenACC % function %qD", node->decl);
>else
> - gcc_unreachable ();
> + error ("missing OpenACC % variable %qD", node->decl);
>  }
>  }

There might be a situation when some func or var is lost during regular LTO,
even if flag_openacc is present.  In this case "missing OpenACC ..." message
would be wrong.  And if flag_openacc is absent, gcc_assert (flag_openacc) is a
bit confusing.  We disscussed this with Cesar here:
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02076.html

  -- Ilya


Re: [gomp4.1] depend nowait support for target {update,{enter,exit} data}

2015-10-19 Thread Ilya Verbin
On Thu, Oct 15, 2015 at 16:01:56 +0200, Jakub Jelinek wrote:
> >void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
> >  
> > +  if (flags & GOMP_TARGET_FLAG_NOWAIT)
> > +{
> > +  gomp_create_target_task (devicep, fn_addr, mapnum, hostaddrs, sizes,
> > +  kinds, flags, depend);
> > +  return;
> > +}
> 
> But this is not ok.  You need to do this far earlier, already before the
> if (depend != NULL) code in GOMP_target_41.  And, I think you should just
> not pass fn_addr, but fn itself.
> 
> > @@ -1636,34 +1657,58 @@ void
> >  gomp_target_task_fn (void *data)
> >  {
> >struct gomp_target_task *ttask = (struct gomp_target_task *) data;
> > +  struct gomp_device_descr *devicep = ttask->devicep;
> > +
> >if (ttask->fn != NULL)
> >  {
> > -  /* GOMP_target_41 */
> > +  if (devicep == NULL
> > + || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
> > +   {
> > + /* FIXME: Save host fn addr into gomp_target_task?  */
> > + gomp_target_fallback_firstprivate (NULL, ttask->mapnum,
> 
> If you pass above fn instead of fn_addr, ttask->fn is what you want
> to pass to gomp_target_fallback_firstprivate here and remove the FIXME.
> 
> > +ttask->hostaddrs, ttask->sizes,
> > +ttask->kinds);
> > + return;
> > +   }
> > +
> > +  struct target_mem_desc *tgt_vars
> > +   = gomp_map_vars (devicep, ttask->mapnum, ttask->hostaddrs, NULL,
> > +ttask->sizes, ttask->kinds, true,
> > +GOMP_MAP_VARS_TARGET);
> > +  devicep->async_run_func (devicep->target_id, ttask->fn,
> > +  (void *) tgt_vars->tgt_start, data);
> 
> You need to void *fn_addr = gomp_get_target_fn_addr (devicep, ttask->fn);
> first obviously, and pass fn_addr.
> 
> > +
> > +  /* FIXME: TMP example of checking for completion.
> > +Alternatively the plugin can set some completion flag in ttask.  */
> > +  while (!devicep->async_is_completed_func (devicep->target_id, data))
> > +   {
> > + fprintf (stderr, "-");
> > + usleep (10);
> > +   }
> 
> This obviously doesn't belong here.
> 
> >if (device->capabilities & GOMP_OFFLOAD_CAP_OPENACC_200)
> > diff --git a/libgomp/testsuite/libgomp.c/target-tmp.c 
> > b/libgomp/testsuite/libgomp.c/target-tmp.c
> > new file mode 100644
> > index 000..23a739c
> > --- /dev/null
> > +++ b/libgomp/testsuite/libgomp.c/target-tmp.c
> > @@ -0,0 +1,40 @@
> > +#include 
> > +#include 
> > +
> > +#pragma omp declare target
> > +void foo (int n)
> > +{
> > +  printf ("Start tgt %d\n", n);
> > +  usleep (500);
> 
> 5s is too long.  Not to mention that not sure if PTX can do printf
> and especially usleep.
> 
> > diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp 
> > b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
> > index 26ac6fe..c843710 100644
> > --- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
> > +++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
> ...
> > +/* Set of asynchronously running target tasks.  */
> > +static std::set *async_tasks;
> > +
> >  /* Thread-safe registration of the main image.  */
> >  static pthread_once_t main_image_is_registered = PTHREAD_ONCE_INIT;
> >  
> > +/* Mutex for protecting async_tasks.  */
> > +static pthread_mutex_t async_tasks_lock = PTHREAD_MUTEX_INITIALIZER;
> > +
> >  static VarDesc vd_host2tgt = {
> >{ 1, 1 },  /* dst, src */
> >{ 1, 0 },  /* in, out  */
> > @@ -156,6 +163,8 @@ init (void)
> >  
> >  out:
> >address_table = new ImgDevAddrMap;
> > +  async_tasks = new std::set;
> > +  pthread_mutex_init (_tasks_lock, NULL);
> 
> PTHREAD_MUTEX_INITIALIZER should already initialize the lock.
> But, do you really need async_tasks and the lock?  Better store
> something into some plugin's owned field in target_task struct and
> let the plugin callback be passed address of that field rather than the
> whole target_task?

So, here is what I have for now.  Attached target-29.c testcase works fine with
MIC emul, however I don't know how to (and where) properly check for completion
of async execution on target.  And, similarly, where to do unmapping after that?
Do we need a callback from plugin to libgomp (as far as I understood, PTX
runtime supports this, but HSA doesn't), or libgomp will just check for
ttask->is_completed in task.c?

 
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 9c8b1fb..e707c80 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -430,6 +430,7 @@ struct gomp_target_task
   size_t *sizes;
   unsigned short *kinds;
   unsigned int flags;
+  bool is_completed;
   void *hostaddrs[];
 };
 
@@ -877,6 +878,7 @@ struct gomp_device_descr
   void *(*host2dev_func) (int, void *, const void *, size_t);
   void *(*dev2dev_func) (int, void *, const void *, size_t);
   void 

Re: OpenACC async clause regressions (was: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data)

2015-10-19 Thread Ilya Verbin
On Mon, Oct 19, 2015 at 18:24:35 +0200, Thomas Schwinge wrote:
> Chung-Lin, would you please have a look at the following (on
> gomp-4_0-branch)?  Also, anyone else got any ideas off-hand?
> 
> PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-2.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors)
> [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-2.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
> PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-3.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors)
> [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-3.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test

Maybe it was caused by this change in gomp_unmap_vars?
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01376.html

Looking at the code, I don't see any difference in async_refcount handling, but
I was unable to test it without having hardware :(

  -- Ilya


Re: [gomp4.1] depend nowait support for target {update,{enter,exit} data}

2015-10-15 Thread Ilya Verbin
On Thu, Oct 15, 2015 at 16:01:56 +0200, Jakub Jelinek wrote:
> On Fri, Oct 02, 2015 at 10:28:01PM +0300, Ilya Verbin wrote:
> > Here is my WIP patch.  target.c part is obviously incorrect, but it 
> > demonstrates
> > a possible libgomp <-> plugin interface for running a target task function
> > asynchronously and checking whether it is completed or not.
> > (Refactored liboffloadmic/runtime/emulator from trunk is required to run
> > target-tmp.c testcase.)
> 
> > diff --git a/libgomp/target.c b/libgomp/target.c
> > index 77bd442..31f034c 100644
> > --- a/libgomp/target.c
> > +++ b/libgomp/target.c
> > @@ -45,6 +45,10 @@
> >  #include "plugin-suffix.h"
> >  #endif
> >  
> > +/* FIXME: TMP */
> > +#include 
> > +#include 
> 
> I hope you mean to remove this later on.

Sure, this is just a prototype, not for committing.


> > @@ -1227,6 +1231,44 @@ gomp_target_fallback (void (*fn) (void *), void 
> > **hostaddrs)
> >*thr = old_thr;
> >  }
> >  
> > +/* Host fallback with firstprivate map-type handling.  */
> > +
> > +static void
> > +gomp_target_fallback_firstprivate (void (*fn) (void *), size_t mapnum,
> > +  void **hostaddrs, size_t *sizes,
> > +  unsigned short *kinds)
> > +{
> > +  size_t i, tgt_align = 0, tgt_size = 0;
> > +  char *tgt = NULL;
> > +  for (i = 0; i < mapnum; i++)
> > +if ((kinds[i] & 0xff) == GOMP_MAP_FIRSTPRIVATE)
> > +  {
> > +   size_t align = (size_t) 1 << (kinds[i] >> 8);
> > +   if (tgt_align < align)
> > + tgt_align = align;
> > +   tgt_size = (tgt_size + align - 1) & ~(align - 1);
> > +   tgt_size += sizes[i];
> > +  }
> > +  if (tgt_align)
> > +{
> > +  tgt = gomp_alloca (tgt_size + tgt_align - 1);
> > +  uintptr_t al = (uintptr_t) tgt & (tgt_align - 1);
> > +  if (al)
> > +   tgt += tgt_align - al;
> > +  tgt_size = 0;
> > +  for (i = 0; i < mapnum; i++)
> > +   if ((kinds[i] & 0xff) == GOMP_MAP_FIRSTPRIVATE)
> > + {
> > +   size_t align = (size_t) 1 << (kinds[i] >> 8);
> > +   tgt_size = (tgt_size + align - 1) & ~(align - 1);
> > +   memcpy (tgt + tgt_size, hostaddrs[i], sizes[i]);
> > +   hostaddrs[i] = tgt + tgt_size;
> > +   tgt_size = tgt_size + sizes[i];
> > + }
> > +}
> > +  gomp_target_fallback (fn, hostaddrs);
> > +}
> 
> This is ok.
> 
> >  /* Helper function of GOMP_target{,_41} routines.  */
> >  
> >  static void *
> > @@ -1311,40 +1353,19 @@ GOMP_target_41 (int device, void (*fn) (void *), 
> > size_t mapnum,
> >if (devicep == NULL
> >|| !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
> >  {
> > -  size_t i, tgt_align = 0, tgt_size = 0;
> > -  char *tgt = NULL;
> > -  for (i = 0; i < mapnum; i++)
> > -   if ((kinds[i] & 0xff) == GOMP_MAP_FIRSTPRIVATE)
> > - {
> > -   size_t align = (size_t) 1 << (kinds[i] >> 8);
> > -   if (tgt_align < align)
> > - tgt_align = align;
> > -   tgt_size = (tgt_size + align - 1) & ~(align - 1);
> > -   tgt_size += sizes[i];
> > - }
> > -  if (tgt_align)
> > -   {
> > - tgt = gomp_alloca (tgt_size + tgt_align - 1);
> > - uintptr_t al = (uintptr_t) tgt & (tgt_align - 1);
> > - if (al)
> > -   tgt += tgt_align - al;
> > - tgt_size = 0;
> > - for (i = 0; i < mapnum; i++)
> > -   if ((kinds[i] & 0xff) == GOMP_MAP_FIRSTPRIVATE)
> > - {
> > -   size_t align = (size_t) 1 << (kinds[i] >> 8);
> > -   tgt_size = (tgt_size + align - 1) & ~(align - 1);
> > -   memcpy (tgt + tgt_size, hostaddrs[i], sizes[i]);
> > -   hostaddrs[i] = tgt + tgt_size;
> > -   tgt_size = tgt_size + sizes[i];
> > - }
> > -   }
> > -  gomp_target_fallback (fn, hostaddrs);
> > +  gomp_target_fallback_firstprivate (fn, mapnum, hostaddrs, sizes, 
> > kinds);
> >return;
> >  }
> 
> This too.

I will commit this small part to gomp-4_5-branch separately.


> > diff --git a/libgomp/testsuite/libgomp.c/target-tmp.c 
> > b/libgomp/testsuite/libgomp.c/target-tmp.c
> > new file mode 100644
> > index 000..23a739c
> > --- /dev/null
> > +++ b/libgomp/testsuite/libgomp.c/target-tmp.c
> > @@ -0,0 

Re: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data

2015-10-13 Thread Ilya Verbin
On Mon, Jun 15, 2015 at 22:48:50 +0300, Ilya Verbin wrote:
> @@ -950,50 +997,41 @@ GOMP_target (int device, void (*fn) (void *), const 
> void *unused,
> ...
> +  devicep->run_func (devicep->target_id, fn_addr, (void *) 
> tgt_vars->tgt_start);

If mapnum is 0, tgt_vars->tgt_start is uninitialized.  This is not a big bug,
because in this case the target function doesn't use this pointer, however
valgrind warns about sending uninitialized data to target.
OK for gomp-4_1-branch?


libgomp/
* target.c (gomp_map_vars): Zero tgt->tgt_start when mapnum is 0.


diff --git a/libgomp/target.c b/libgomp/target.c
index 95360d1..c4e3323 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -323,6 +323,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t 
mapnum,
   struct splay_tree_key_s cur_node;
   struct target_mem_desc *tgt
 = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum);
+  tgt->tgt_start = 0;
   tgt->list_count = mapnum;
   tgt->refcount = pragma_kind == GOMP_MAP_VARS_ENTER_DATA ? 0 : 1;
   tgt->device_descr = devicep;


  -- Ilya


Re: libgomp: Guard all devices/num_devices/num_devices_openmp access by register_lock

2015-10-09 Thread Ilya Verbin
On Fri, Oct 09, 2015 at 13:58:32 +0200, Bernd Schmidt wrote:
> One oddity I noticed in target.c is that there are two different num_devices
> variables:
> 
>   /* Total number of available devices.  */
>   static int num_devices;
> 
>   /* Number of GOMP_OFFLOAD_CAP_OPENMP_400 devices.  */
>   static int num_devices_openmp;
> 
> Confusingly, the get_num_devices function returns num_devices_openmp. That
> function includes a pthread_once call to gomp_target_init, which sets up
> these variables. References to num_devices_openmp through get_num_devices
> are thereforce guaranteed to be initialized. However, there are direct
> references to num_devices, in GOMP_offload_register_ver and
> GOMP_offload_unregister_ver, and they don't seem to enforce any kind of
> initialization:
> 
>   /* Load image to all initialized devices.  */
>   for (i = 0; i < num_devices; i++)
> {
>   struct gomp_device_descr *devicep = [i];
>   gomp_mutex_lock (>lock);
>   if (devicep->type == target_type && devicep->is_initialized)
> gomp_load_image_to_device (devicep, version,
>host_table, target_data, true);
>   gomp_mutex_unlock (>lock);
> }
> 
> I'm guessing this only triggers when dlopening something with an offload
> image after devices have been initialized already, and it looks like we have
> symmetrical code in gomp_init_device.

Right, this code offloads given image to all initialized devices, and similar
code in gomp_init_device offloads all registered images to a given device.

> Wouldn't it be possible/better to
> force a gomp_target_init before referencing num_devices, and then relying on
> the code I quoted and deleting the image loading from gomp_init_device?

gomp_target_init only loads plugins and sets num_devices/num_devices_openmp, but
it doesn't call gomp_init_device, because we wanted to defer device
initialization as much as possible.  So, gomp_init_device is called immediately
before usage of that device.

  -- Ilya


Re: [gomp4.1] OpenMP 4.1 is dead, long live OpenMP 4.5

2015-10-09 Thread Ilya Verbin
On Fri, Oct 09, 2015 at 09:55:07 +0200, Jakub Jelinek wrote:
> -GOMP_4.1 {
> +GOMP_4.5 {
>global:
>   GOMP_target_41;
>   GOMP_target_data_41;

Should we rename it to GOMP_target*_45, or do you know some more mnemonic name?

  -- Ilya


Re: [PATCH][committed] Fix PR67652: wrong sizeof calculation in liboffloadmic

2015-10-08 Thread Ilya Verbin
On Mon, Sep 28, 2015 at 18:15:14 +0200, Jakub Jelinek wrote:
> > -char * env_var = (char*) 
> > malloc(sizeof("COI_DMA_CHANNEL_COUNT=2" + 1));
> > +char * env_var = (char*) 
> > malloc(sizeof("COI_DMA_CHANNEL_COUNT=2"));
> >  sprintf(env_var, "COI_DMA_CHANNEL_COUNT=2");
> >  putenv(env_var);  
> 
> Missing error handling if malloc returns NULL?

Fixed.

On Mon, Sep 28, 2015 at 09:19:30 -0700, Andrew Pinski wrote:
> Also why not just use strdup here? instead of malloc/sizeof/sprintf ?

Done.

Committed as obvious.


liboffloadmic/
* runtime/offload_engine.cpp (Engine::init_process): Use strdup instead
of sizeof+malloc+sprintf, check for return value.
* runtime/offload_env.cpp (MicEnvVar::get_env_var_kind): Check for
strdup return value.
* runtime/offload_host.cpp (__offload_init_library_once): Check for
strdup return value.  Fix size calculation of COI_HOST_THREAD_AFFINITY.
* runtime/emulator/coi_device.cpp (COIProcessWaitForShutdown): Check for
malloc return value.


diff --git a/liboffloadmic/runtime/offload_engine.cpp 
b/liboffloadmic/runtime/offload_engine.cpp
index 00b673a..4a88546 100644
--- a/liboffloadmic/runtime/offload_engine.cpp
+++ b/liboffloadmic/runtime/offload_engine.cpp
@@ -173,8 +173,9 @@ void Engine::init_process(void)
 // use putenv instead of setenv as Windows has no setenv.
 // Note: putenv requires its argument can't be freed or modified.
 // So no free after call to putenv or elsewhere.
-char * env_var = (char*) malloc(sizeof("COI_DMA_CHANNEL_COUNT=2"));
-sprintf(env_var, "COI_DMA_CHANNEL_COUNT=2");
+char * env_var = strdup("COI_DMA_CHANNEL_COUNT=2");
+   if (env_var == NULL)
+ LIBOFFLOAD_ERROR(c_malloc);
 putenv(env_var);  
 }
 }
diff --git a/liboffloadmic/runtime/offload_env.cpp 
b/liboffloadmic/runtime/offload_env.cpp
index 79f5f36..ac33b67 100644
--- a/liboffloadmic/runtime/offload_env.cpp
+++ b/liboffloadmic/runtime/offload_env.cpp
@@ -212,10 +212,14 @@ MicEnvVarKind MicEnvVar::get_env_var_kind(
 *env_var_name_length = 3;
 *env_var_name = *env_var_def = c;
 *env_var_def = strdup(*env_var_def);
+   if (*env_var_def == NULL)
+ LIBOFFLOAD_ERROR(c_malloc);
 return  c_mic_var;
 }
 *env_var_def = c + strlen("ENV=");
 *env_var_def = strdup(*env_var_def);
+   if (*env_var_def == NULL)
+ LIBOFFLOAD_ERROR(c_malloc);
 return c_mic_card_env;
 }
 if (isalpha(*c)) {
@@ -229,6 +233,8 @@ MicEnvVarKind MicEnvVar::get_env_var_kind(
 return c_no_mic;
 }
 *env_var_def = strdup(*env_var_def);
+if (*env_var_def == NULL)
+  LIBOFFLOAD_ERROR(c_malloc);
 return card_is_set? c_mic_card_var : c_mic_var;
 }
 
diff --git a/liboffloadmic/runtime/offload_host.cpp 
b/liboffloadmic/runtime/offload_host.cpp
index 08f626f..eec457d 100644
--- a/liboffloadmic/runtime/offload_host.cpp
+++ b/liboffloadmic/runtime/offload_host.cpp
@@ -5173,6 +5173,8 @@ static void __offload_init_library_once(void)
 if (strcasecmp(env_var, "none") != 0) {
 // value is composed of comma separated physical device indexes
 char *buf = strdup(env_var);
+   if (buf == NULL)
+ LIBOFFLOAD_ERROR(c_malloc);
 char *str, *ptr;
 for (str = strtok_r(buf, ",", ); str != 0;
  str = strtok_r(0, ",", )) {
@@ -5245,7 +5247,9 @@ static void __offload_init_library_once(void)
 if (env_var != 0) {
 char * new_env_var =
(char*) malloc(sizeof("COI_HOST_THREAD_AFFINITY=") +
-  sizeof(env_var) + 1);
+  strlen(env_var));
+   if (new_env_var == NULL)
+ LIBOFFLOAD_ERROR(c_malloc);
 sprintf(new_env_var, "COI_HOST_THREAD_AFFINITY=%s", env_var);
 putenv(new_env_var);
 }
@@ -5254,6 +5258,8 @@ static void __offload_init_library_once(void)
 env_var = getenv("MIC_LD_LIBRARY_PATH");
 if (env_var != 0) {
 mic_library_path = strdup(env_var);
+   if (mic_library_path == NULL)
+ LIBOFFLOAD_ERROR(c_malloc);
 }
 
 
@@ -5262,6 +5268,8 @@ static void __offload_init_library_once(void)
 const char *base_name = "offload_main";
 if (mic_library_path != 0) {
 char *buf = strdup(mic_library_path);
+   if (buf == NULL)
+ LIBOFFLOAD_ERROR(c_malloc);
 char *try_name = (char*) alloca(strlen(mic_library_path) +
 strlen(base_name) + 2);
 char *dir, *ptr;
@@ -5275,6 +5283,8 @@ static void __offload_init_library_once(void)
 struct stat st;
 if (stat(try_name, ) == 0 && S_ISREG(st.st_mode)) {
 mic_device_main = strdup(try_name);
+   if (mic_device_main == NULL)
+   

Re: [gomp4.1] depend nowait support for target {update,{enter,exit} data}

2015-10-02 Thread Ilya Verbin
Hi!

On Tue, Sep 08, 2015 at 11:20:14 +0200, Jakub Jelinek wrote:
> nowait support for #pragma omp target is not implemented yet, supposedly we
> need to mark those somehow (some flag) already in the struct gomp_task
> structure, essentially it will need either 2 or 3 callbacks
> (the current one, executed when the dependencies are resolved (it actually
> waits until some thread schedules it after that point, I think it is
> undesirable to run it with the tasking lock held), which would perform
> the gomp_map_vars and initiate the running of the region, and then some
> query routine which would poll the plugin whether the task is done or not,
> and either perform the finalization (unmap_vars) if it is done (and in any
> case return bool whether it should be polled again or not), and if the
> finalization is not done there, also another callback for the finalization.
> Also, there is the issue that if we are waiting for task that needs to be
> polled, and we don't have any further tasks to run, we shouldn't really
> attempt to sleep on some semaphore (e.g. in taskwait, end of
> taskgroup, etc.) or barrier, but rather either need to keep polling it, or
> call the query hook with some argument that it should sleep in there until
> the work is done by the offloading device.
> Also, there needs to be a way for the target nowait first callback to say
> that it is using host fallback and thus acts as a normal task, therefore
> once the task fn finishes, the task is done.

Here is my WIP patch.  target.c part is obviously incorrect, but it demonstrates
a possible libgomp <-> plugin interface for running a target task function
asynchronously and checking whether it is completed or not.
(Refactored liboffloadmic/runtime/emulator from trunk is required to run
target-tmp.c testcase.)


diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index d798321..8e2b5aa 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -872,6 +872,8 @@ struct gomp_device_descr
   void *(*host2dev_func) (int, void *, const void *, size_t);
   void *(*dev2dev_func) (int, void *, const void *, size_t);
   void (*run_func) (int, void *, void *);
+  void (*async_run_func) (int, void *, void *, const void *);
+  bool (*async_is_completed_func) (int, const void *);
 
   /* Splay tree containing information about mapped memory regions.  */
   struct splay_tree_s mem_map;
diff --git a/libgomp/target.c b/libgomp/target.c
index 77bd442..31f034c 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -45,6 +45,10 @@
 #include "plugin-suffix.h"
 #endif
 
+/* FIXME: TMP */
+#include 
+#include 
+
 static void gomp_target_init (void);
 
 /* The whole initialization code for offloading plugins is only run one.  */
@@ -1227,6 +1231,44 @@ gomp_target_fallback (void (*fn) (void *), void 
**hostaddrs)
   *thr = old_thr;
 }
 
+/* Host fallback with firstprivate map-type handling.  */
+
+static void
+gomp_target_fallback_firstprivate (void (*fn) (void *), size_t mapnum,
+  void **hostaddrs, size_t *sizes,
+  unsigned short *kinds)
+{
+  size_t i, tgt_align = 0, tgt_size = 0;
+  char *tgt = NULL;
+  for (i = 0; i < mapnum; i++)
+if ((kinds[i] & 0xff) == GOMP_MAP_FIRSTPRIVATE)
+  {
+   size_t align = (size_t) 1 << (kinds[i] >> 8);
+   if (tgt_align < align)
+ tgt_align = align;
+   tgt_size = (tgt_size + align - 1) & ~(align - 1);
+   tgt_size += sizes[i];
+  }
+  if (tgt_align)
+{
+  tgt = gomp_alloca (tgt_size + tgt_align - 1);
+  uintptr_t al = (uintptr_t) tgt & (tgt_align - 1);
+  if (al)
+   tgt += tgt_align - al;
+  tgt_size = 0;
+  for (i = 0; i < mapnum; i++)
+   if ((kinds[i] & 0xff) == GOMP_MAP_FIRSTPRIVATE)
+ {
+   size_t align = (size_t) 1 << (kinds[i] >> 8);
+   tgt_size = (tgt_size + align - 1) & ~(align - 1);
+   memcpy (tgt + tgt_size, hostaddrs[i], sizes[i]);
+   hostaddrs[i] = tgt + tgt_size;
+   tgt_size = tgt_size + sizes[i];
+ }
+}
+  gomp_target_fallback (fn, hostaddrs);
+}
+
 /* Helper function of GOMP_target{,_41} routines.  */
 
 static void *
@@ -1311,40 +1353,19 @@ GOMP_target_41 (int device, void (*fn) (void *), size_t 
mapnum,
   if (devicep == NULL
   || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
 {
-  size_t i, tgt_align = 0, tgt_size = 0;
-  char *tgt = NULL;
-  for (i = 0; i < mapnum; i++)
-   if ((kinds[i] & 0xff) == GOMP_MAP_FIRSTPRIVATE)
- {
-   size_t align = (size_t) 1 << (kinds[i] >> 8);
-   if (tgt_align < align)
- tgt_align = align;
-   tgt_size = (tgt_size + align - 1) & ~(align - 1);
-   tgt_size += sizes[i];
- }
-  if (tgt_align)
-   {
- tgt = gomp_alloca (tgt_size + tgt_align - 1);
- uintptr_t al = (uintptr_t) tgt & (tgt_align - 1);
- if (al)
-   tgt += tgt_align - al;
- tgt_size = 0;
- 

Re: [gomp4.1] Doacross tweaks

2015-09-30 Thread Ilya Verbin
Hi!

On Fri, Sep 25, 2015 at 18:54:47 +0200, Jakub Jelinek wrote:
> --- gcc/tree-pretty-print.c.jj2015-09-03 16:35:58.0 +0200
> +++ gcc/tree-pretty-print.c   2015-09-25 15:04:46.911844111 +0200
> @@ -569,7 +569,9 @@ dump_omp_clause (pretty_printer *pp, tre
>   if (TREE_PURPOSE (t) != integer_zero_node)
> {
>   tree p = TREE_PURPOSE (t);
> - if (!wi::neg_p (p, TYPE_SIGN (TREE_TYPE (p
> + if (OMP_CLAUSE_DEPEND_SINK_NEGATIVE (t))
> +   pp_minus (pp);
> + else
> pp_plus (pp);
>   dump_generic_node (pp, TREE_PURPOSE (t), spc, flags,
>  false);

This caused a warning:

gcc/tree-pretty-print.c: In function ‘void dump_omp_clause(pretty_printer*, 
tree, int, int)’:
gcc/tree-pretty-print.c:571:12: error: unused variable ‘p’ 
[-Werror=unused-variable]
   tree p = TREE_PURPOSE (t);
^

  -- Ilya


Re: [PATCH] liboffloadmic emulation mode: make it asynchronous

2015-09-29 Thread Ilya Verbin
On Tue, Sep 29, 2015 at 09:01:33 +0200, Jakub Jelinek wrote:
> On Mon, Sep 28, 2015 at 05:53:42PM +0300, Ilya Verbin wrote:
> > Currently the COI emulator is single-threaded, i.e. it is able to run only 
> > one
> > target function at a time, e.g. the following testcase:
> > 
> >   #pragma omp parallel sections num_threads(2)
> > {
> >   #pragma omp section
> >   #pragma omp target
> >   while (1)
> > putchar ('.');
> > 
> >   #pragma omp section
> >   #pragma omp target
> >   while (1)
> > putchar ('o');
> > }
> > 
> > prints only dots using emul, while using real libcoi it prints:
> > ...o.o.o.o...o...o.oo.o.o.ooo.oo...o.o.o...o.ooo
> > Of course, it's not possible to test new OpenMP 4.1's async features using 
> > such
> > an emulator.
> > 
> > The patch bellow makes it asynchronous, it creates an auxiliary thread for 
> > each
> > COIPipeline in host and in target processes.  In general, a new COIPipeline 
> > is
> > created by liboffloadmic for each host thread with offload, i.e. the example
> > above has:
> > 4 threads in the host process (2 OpenMP threads + 2 auxiliary threads) and
> > 3 threads in the target process (1 main thread + 2 auxiliary threads).
> > An auxiliary host thread runs a target function in the new thread in target
> > process and waits for its completion.  When the function is finished, the 
> > host
> > thread signals an event and can run a callback, if it is registered.
> > liboffloadmic waits for signalled events by calling COIEventWait.
> > This is identical to how real libcoi works.
> > 
> > make check-target-libgomp and some internal tests did not show any 
> > regression.
> > TSan report is clean.  Is it OK for trunk?
> 
> For now ok.  Though, I'd say I'd prefer if there were no auxiliary threads
> on the host side, just whatever thread is asked to send something to/from
> the device, wait for something and/or poll for something just polling the
>
> pipes.  Are there auxiliary host threads also for the case when using
> the real COI, offloading to hw?

Yes.

  -- Ilya


[PATCH] liboffloadmic emulation mode: make it asynchronous

2015-09-28 Thread Ilya Verbin
Hi!

Currently the COI emulator is single-threaded, i.e. it is able to run only one
target function at a time, e.g. the following testcase:

  #pragma omp parallel sections num_threads(2)
{
  #pragma omp section
  #pragma omp target
  while (1)
putchar ('.');

  #pragma omp section
  #pragma omp target
  while (1)
putchar ('o');
}

prints only dots using emul, while using real libcoi it prints:
...o.o.o.o...o...o.oo.o.o.ooo.oo...o.o.o...o.ooo
Of course, it's not possible to test new OpenMP 4.1's async features using such
an emulator.

The patch bellow makes it asynchronous, it creates an auxiliary thread for each
COIPipeline in host and in target processes.  In general, a new COIPipeline is
created by liboffloadmic for each host thread with offload, i.e. the example
above has:
4 threads in the host process (2 OpenMP threads + 2 auxiliary threads) and
3 threads in the target process (1 main thread + 2 auxiliary threads).
An auxiliary host thread runs a target function in the new thread in target
process and waits for its completion.  When the function is finished, the host
thread signals an event and can run a callback, if it is registered.
liboffloadmic waits for signalled events by calling COIEventWait.
This is identical to how real libcoi works.

make check-target-libgomp and some internal tests did not show any regression.
TSan report is clean.  Is it OK for trunk?


liboffloadmic/
* plugin/libgomp-plugin-intelmic.cpp (OFFLOAD_ACTIVE_WAIT_ENV): New
define.
(init): Set OFFLOAD_ACTIVE_WAIT env var to 0, if it is not set.
* runtime/emulator/coi_common.h (PIPE_HOST_PATH): Replace with ...
(PIPE_HOST2TGT_NAME): ... this.
(PIPE_TARGET_PATH): Replace with ...
(PIPE_TGT2HOST_NAME): ... this.
(MALLOCN): New define.
(READN): Likewise.
(WRITEN): Likewise.
(enum cmd_t): Replace CMD_RUN_FUNCTION with CMD_PIPELINE_RUN_FUNCTION.
Add CMD_PIPELINE_CREATE, CMD_PIPELINE_DESTROY.
* runtime/emulator/coi_device.cpp (engine_dir): New static variable.
(pipeline_thread_routine): New static function.
(COIProcessWaitForShutdown): Use global engine_dir instead of mic_dir.
Rename pipe_host and pipe_target to pipe_host2tgt and pipe_tgt2host.
If cmd is CMD_PIPELINE_CREATE, create a new thread for the pipeline.
Remove cmd == CMD_RUN_FUNCTION case.
* runtime/emulator/coi_device.h (COIERRORN): New define.
* runtime/emulator/coi_host.cpp: Include set, map, queue.
Replace typedefs with enums and structs.
(struct Function): Remove name, add num_buffers, bufs_size,
bufs_data_target, misc_data_len, misc_data, return_value_len,
return_value, completion_event.
(struct Callback): New.
(struct Process): Remove pipeline.  Add pipe_host2tgt and pipe_tgt2host.
(struct Pipeline): Remove pipe_host and pipe_target.  Add thread,
destroy, is_destroyed, pipe_host2tgt_path, pipe_tgt2host_path,
pipe_host2tgt, pipe_tgt2host, queue, process.
(max_pipeline_num): New static variable.
(pipelines): Likewise.
(max_event_num): Likewise.
(non_signalled_events): Likewise.
(errored_events): Likewise.
(callbacks): Likewise.
(cleanup): Do not check tmp_dirs before free.
(start_critical_section): New static function.
(finish_critical_section): Likewise.
(pipeline_is_destroyed): Likewise.
(maybe_invoke_callback): Likewise.
(signal_event): Likewise.
(get_event_result): Likewise.
(COIBufferCopy): Rename arguments according to headers.  Add asserts.
Use process' main pipes, instead of pipeline's pipes.  Signal completion
event.
(COIBufferCreate): Rename arguments according to headers.  Add asserts.
Use process' main pipes, instead of pipeline's pipes.
(COIBufferCreateFromMemory): Rename arguments according to headers.
Add asserts.
(COIBufferDestroy): Rename arguments according to headers.  Add asserts.
Use process' main pipes, instead of pipeline's pipes.
(COIBufferGetSinkAddress): Rename arguments according to headers.
Add asserts.
(COIBufferMap): Rename arguments according to headers.  Add asserts.
Signal completion event.
(COIBufferRead): Likewise.
(COIBufferSetState): Likewise.
(COIBufferUnmap): Likewise.
(COIBufferWrite): Likewise.
(COIEngineGetCount): Add assert.
(COIEngineGetHandle): Rename arguments according to headers.
Add assert.
(COIEventWait): Rename arguments according to headers.  Add asserts.
Implement waiting for events with zero or infinite timeout.
(COIEventRegisterCallback): New function.
(pipeline_thread_routine): New static function.

[PATCH][committed] Fix PR67652: wrong sizeof calculation in liboffloadmic

2015-09-28 Thread Ilya Verbin
Committed to trunk as obvious.

PR other/67652
* runtime/offload_engine.cpp (Engine::init_process): Fix sizeof.

diff --git a/liboffloadmic/runtime/offload_engine.cpp 
b/liboffloadmic/runtime/offload_engine.cpp
index 16b440d..00b673a 100644
--- a/liboffloadmic/runtime/offload_engine.cpp
+++ b/liboffloadmic/runtime/offload_engine.cpp
@@ -173,7 +173,7 @@ void Engine::init_process(void)
 // use putenv instead of setenv as Windows has no setenv.
 // Note: putenv requires its argument can't be freed or modified.
 // So no free after call to putenv or elsewhere.
-char * env_var = (char*) malloc(sizeof("COI_DMA_CHANNEL_COUNT=2" + 
1));
+char * env_var = (char*) malloc(sizeof("COI_DMA_CHANNEL_COUNT=2"));
 sprintf(env_var, "COI_DMA_CHANNEL_COUNT=2");
 putenv(env_var);  
 }

  -- Ilya


Re: [PATCH][committed] Fix PR67652: wrong sizeof calculation in liboffloadmic

2015-09-28 Thread Ilya Verbin
On Mon, Sep 28, 2015 at 18:15:14 +0200, Jakub Jelinek wrote:
> On Mon, Sep 28, 2015 at 07:10:13PM +0300, Ilya Verbin wrote:
> > Committed to trunk as obvious.
> > 
> > PR other/67652
> > * runtime/offload_engine.cpp (Engine::init_process): Fix sizeof.
> > 
> > diff --git a/liboffloadmic/runtime/offload_engine.cpp 
> > b/liboffloadmic/runtime/offload_engine.cpp
> > index 16b440d..00b673a 100644
> > --- a/liboffloadmic/runtime/offload_engine.cpp
> > +++ b/liboffloadmic/runtime/offload_engine.cpp
> > @@ -173,7 +173,7 @@ void Engine::init_process(void)
> >  // use putenv instead of setenv as Windows has no setenv.
> >  // Note: putenv requires its argument can't be freed or 
> > modified.
> >  // So no free after call to putenv or elsewhere.
> > -char * env_var = (char*) 
> > malloc(sizeof("COI_DMA_CHANNEL_COUNT=2" + 1));
> > +char * env_var = (char*) 
> > malloc(sizeof("COI_DMA_CHANNEL_COUNT=2"));
> >  sprintf(env_var, "COI_DMA_CHANNEL_COUNT=2");
> >  putenv(env_var);  
> 
> Missing error handling if malloc returns NULL?

Yes :(
I will grep all mallocs/reallocs one more time.

  -- Ilya


Re: [PATCH 1/4] Add mkoffload for Intel MIC

2015-09-28 Thread Ilya Verbin
On Mon, Sep 28, 2015 at 12:09:19 +0200, Bernd Schmidt wrote:
> On 09/28/2015 12:03 PM, Bernd Schmidt wrote:
> >On 09/28/2015 10:26 AM, Thomas Schwinge wrote:
> >>-  objcopy_argv[8] = NULL;
> >>+  objcopy_argv[objcopy_argc++] = NULL;
> >>+  gcc_checking_assert (objcopy_argc <= OBJCOPY_ARGC_MAX);
> >
> >On its own this is not an improvement - you're trading a compile time
> >error for a runtime error. So, what is the other change this is
> >preparing for?
> 
> Ok, I now see the other patch. But I also see that other code in the same
> file and in the nvptx mkoffload is using the obstack_ptr_grow method to
> build argv arrays, I think that would be preferrable to this.

I've removed obstack_ptr_grow for arrays with known sizes after this review:
https://gcc.gnu.org/ml/gcc-patches/2014-10/msg02210.html

  -- Ilya


Re: libgomp: Guard all offload_images/num_offload_images access by register_lock (was: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks)

2015-09-25 Thread Ilya Verbin
On Fri, Sep 25, 2015 at 18:21:27 +0200, Thomas Schwinge wrote:
> On Thu, 26 Mar 2015 23:41:30 +0300, Ilya Verbin <iver...@gmail.com> wrote:
> > On Thu, Mar 26, 2015 at 13:09:19 +0100, Jakub Jelinek wrote:
> > > the current code is majorly broken.  As I've said earlier, e.g. the lack
> > > of mutex guarding gomp_target_init (which is using pthread_once guaranteed
> > > to be run just once) vs. concurrent GOMP_offload_register calls
> > > (if those are run from ctors, then I guess something like dl_load_lock
> > > ensures at least on glibc that multiple GOMP_offload_register calls aren't
> > > performed at the same time) in accessing/reallocating offload_images
> > > and num_offload_images and the lack of support to register further
> > > images after the gomp_target_init call (if you dlopen further shared
> > > libraries) is really bad.  And it would be really nice to support the
> > > unloading.
> 
> > Here is the latest patch for libgomp and mic plugin.
> 
> What about the scenario where one thread is inside
> GOMP_offload_register_ver/GOMP_offload_register (say, due to opening a
> shared library with such a mkoffload-generated constructor) and is
> modifying offload_images with register_lock held, and another thread is
> inside a GOMP_target* construct -> gomp_init_device and is accessing
> offload_images without register_lock held?  Or, why isn't that a
> reachable scenario?
> 
> Would the following patch (untested) do the right thing (locking added to
> gomp_init_device and gomp_unload_device)?  We can then also remove the
> is_register_lock parameter from gomp_load_image_to_device, and simplify
> the code.

Looks like you're right, and this scenario is possible.

  -- Ilya


  1   2   3   4   >