Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-04-20 Thread Gerald Pfeifer
On Mon, 20 Apr 2015, Thomas Schwinge wrote:
 Gerald, is it OK to commit the following to update GCC 5 changes' »New
 Languages and Language specific improvements« section?

Sure thing.

Gerald

Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-04-20 Thread Thomas Schwinge
Hi!

On Thu, 15 Jan 2015 21:20:07 +0100, I wrote:
 In r219682, I have committed to trunk our current set of OpenACC changes,
 which we had prepared on gomp-4_0-branch.  Thanks to everyone who has
 been contributing!
 
 Note that this is an experimental feature, incomplete, and subject to
 change in future versions of GCC.  We shall update -- and keep updated --
 https://gcc.gnu.org/wiki/OpenACC, to track the current status.

(This has now happened, finally...)

Gerald, is it OK to commit the following to update GCC 5 changes' »New
Languages and Language specific improvements« section?

Index: htdocs/gcc-5/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v
retrieving revision 1.109
diff -u -p -r1.109 changes.html
--- htdocs/gcc-5/changes.html   20 Apr 2015 08:22:35 -  1.109
+++ htdocs/gcc-5/changes.html   20 Apr 2015 14:20:54 -
@@ -193,6 +193,12 @@
  liCard emulator./li
/ul
 /li
+li id=openacc
+  GCC 5 includes a preliminary implementation of the OpenACC 2.0a
+  specification.  OpenACC is intended for programming accelerator devices
+  such as GPUs.  See a href=https://gcc.gnu.org/wiki/OpenACC;the OpenACC
+  wiki page/a for more information.
+/li
   /ul
 
 


Grüße,
 Thomas


signature.asc
Description: PGP signature


Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch) (PR65742)

2015-04-17 Thread Julian Brown
On Tue, 14 Apr 2015 15:15:02 +0100
Julian Brown jul...@codesourcery.com wrote:

 On Wed, 8 Apr 2015 17:58:56 +0300
 Ilya Verbin iver...@gmail.com wrote:
 
  On Wed, Apr 08, 2015 at 15:31:42 +0100, Julian Brown wrote:
   This version is mostly the same as the last posted version but
   has a tweak in GOACC_parallel to account for the new splay tree
   arrangement for target functions:
   
   -  tgt_fn = (void (*)) tgt_fn_key-tgt-tgt_start;
   +  tgt_fn = (void (*)) tgt_fn_key-tgt_offset;
   
   Have there been any other changes I might have missed?
  
  No.
  
   It passes libgomp testing on NVPTX. OK?
  
  Have you tested it with disabled offloading?
  
  I see several regressions:
  FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_on_device-1.c
  -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
  FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c
  -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
 
 I think there may be multiple issues here. The attached patch
 addresses one -- acc_device_type not distinguishing between
 offloaded and host code with the host_nonshm plugin.

The patch appears to fix the original issue after all: I've re-run
tests with host==target and the failures no longer appear. Also the
same has been noted by Dominique d'Humieres in PR65742.

 The other problem is that it appears that the ACC_DEVICE_TYPE
 environment variable is not getting set properly on the target for
 (any of) the OpenACC tests: this means a lot of the time the wrong
 plugin is being tested, and means that the above tests (and several
 others) still fail. That will apparently need some more engineering
 (on our part).

Fixing this turns out to require more DejaGNU-fu than I have: AFAICT,
setting a per-test environment variable from an .exp file can't easily
be done at present. The potentially useful-looking
{dg-}set-target-env-var doesn't look quite suitable for this purpose,
and besides which doesn't actually seem to be implemented for host !=
target anyway.

(At least, if this fragment of gcc-dg.exp is anything to go by:

   if { [info exists set_target_env_var] \
 [llength $set_target_env_var] != 0 } {
 if { [is_remote target] } {
   return [list unsupported ]
 } ...
).

So: OK for trunk?

Thanks,

Julian

 ChangeLog
 
 libgomp/
 * oacc-init.c (acc_on_device): Check whether we're in an offloaded
 region for host_nonshm plugin.
 * plugin/plugin-host.c (GOMP_OFFLOAD_openacc_parallel): Set
 nonshm_exec flag in thread-local data.
 (GOMP_OFFLOAD_openacc_create_thread_data): Allocate thread-local
 data for host_nonshm plugin.
 (+GOMP_OFFLOAD_openacc_destroy_thread_data): Free thread-local
 data for host_nonshm plugin.
 * plugin/plugin-host.h: New.


Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-04-15 Thread Dominique Dhumieres
 (Not asking for review just yet, JFYI.)

This is not a review! but the patch fixes PR65742.

Dominique


Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-04-14 Thread Julian Brown
On Wed, 8 Apr 2015 17:58:56 +0300
Ilya Verbin iver...@gmail.com wrote:

 On Wed, Apr 08, 2015 at 15:31:42 +0100, Julian Brown wrote:
  This version is mostly the same as the last posted version but has a
  tweak in GOACC_parallel to account for the new splay tree
  arrangement for target functions:
  
  -  tgt_fn = (void (*)) tgt_fn_key-tgt-tgt_start;
  +  tgt_fn = (void (*)) tgt_fn_key-tgt_offset;
  
  Have there been any other changes I might have missed?
 
 No.
 
  It passes libgomp testing on NVPTX. OK?
 
 Have you tested it with disabled offloading?
 
 I see several regressions:
 FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_on_device-1.c
 -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
 FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c
 -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test

I think there may be multiple issues here. The attached patch addresses
one -- acc_device_type not distinguishing between offloaded and host
code with the host_nonshm plugin.

The other problem is that it appears that the ACC_DEVICE_TYPE
environment variable is not getting set properly on the target for (any
of) the OpenACC tests: this means a lot of the time the wrong plugin
is being tested, and means that the above tests (and several others)
still fail. That will apparently need some more engineering (on our
part).

(Not asking for review just yet, JFYI.)

Julian

ChangeLog

libgomp/
* oacc-init.c (acc_on_device): Check whether we're in an offloaded
region for host_nonshm plugin.
* plugin/plugin-host.c (GOMP_OFFLOAD_openacc_parallel): Set
nonshm_exec flag in thread-local data.
(GOMP_OFFLOAD_openacc_create_thread_data): Allocate thread-local
data for host_nonshm plugin.
(+GOMP_OFFLOAD_openacc_destroy_thread_data): Free thread-local data
for host_nonshm plugin.
* plugin/plugin-host.h: New.Index: libgomp/oacc-init.c
===
--- libgomp/oacc-init.c	(revision 221922)
+++ libgomp/oacc-init.c	(working copy)
@@ -29,6 +29,7 @@
 #include libgomp.h
 #include oacc-int.h
 #include openacc.h
+#include plugin/plugin-host.h
 #include assert.h
 #include stdlib.h
 #include strings.h
@@ -548,7 +549,14 @@ ialias (acc_set_device_num)
 int
 acc_on_device (acc_device_t dev)
 {
-  if (acc_get_device_type () == acc_device_host_nonshm)
+  struct goacc_thread *thr = goacc_thread ();
+
+  /* We only want to appear to be the host_nonshm plugin from offloaded
+ code -- i.e. within a parallel region.  Test a flag set by the
+ openacc_parallel hook of the host_nonshm plugin to determine that.  */
+  if (acc_get_device_type () == acc_device_host_nonshm
+   thr  thr-target_tls
+   ((struct nonshm_thread *)thr-target_tls)-nonshm_exec)
 return dev == acc_device_host_nonshm || dev == acc_device_not_host;
 
   /* Just rely on the compiler builtin.  */
Index: libgomp/plugin/plugin-host.c
===
--- libgomp/plugin/plugin-host.c	(revision 221922)
+++ libgomp/plugin/plugin-host.c	(working copy)
@@ -44,6 +44,7 @@
 #include stdlib.h
 #include string.h
 #include stdio.h
+#include stdbool.h
 
 #ifdef HOST_NONSHM_PLUGIN
 #define STATIC
@@ -55,6 +56,10 @@
 #define SELF host: 
 #endif
 
+#ifdef HOST_NONSHM_PLUGIN
+#include plugin-host.h
+#endif
+
 STATIC const char *
 GOMP_OFFLOAD_get_name (void)
 {
@@ -174,7 +179,10 @@ GOMP_OFFLOAD_openacc_parallel (void (*fn
 			   void *targ_mem_desc __attribute__ ((unused)))
 {
 #ifdef HOST_NONSHM_PLUGIN
+  struct nonshm_thread *thd = GOMP_PLUGIN_acc_thread ();
+  thd-nonshm_exec = true;
   fn (devaddrs);
+  thd-nonshm_exec = false;
 #else
   fn (hostaddrs);
 #endif
@@ -232,11 +240,20 @@ STATIC void *
 GOMP_OFFLOAD_openacc_create_thread_data (int ord
 	 __attribute__ ((unused)))
 {
+#ifdef HOST_NONSHM_PLUGIN
+  struct nonshm_thread *thd
+= GOMP_PLUGIN_malloc (sizeof (struct nonshm_thread));
+  thd-nonshm_exec = false;
+  return thd;
+#else
   return NULL;
+#endif
 }
 
 STATIC void
-GOMP_OFFLOAD_openacc_destroy_thread_data (void *tls_data
-	  __attribute__ ((unused)))
+GOMP_OFFLOAD_openacc_destroy_thread_data (void *tls_data)
 {
+#ifdef HOST_NONSHM_PLUGIN
+  free (tls_data);
+#endif
 }
Index: libgomp/plugin/plugin-host.h
===
--- libgomp/plugin/plugin-host.h	(revision 0)
+++ libgomp/plugin/plugin-host.h	(revision 0)
@@ -0,0 +1,37 @@
+/* OpenACC Runtime Library: acc_device_host, acc_device_host_nonshm.
+
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded.
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.

Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-04-08 Thread Julian Brown
On Tue, 7 Apr 2015 17:26:45 +0200
Jakub Jelinek ja...@redhat.com wrote:

 On Mon, Apr 06, 2015 at 03:45:57PM +0300, Ilya Verbin wrote:
  On Wed, Apr 01, 2015 at 15:20:25 +0200, Jakub Jelinek wrote:
   LGTM with proper ChangeLog entry.
  
  I've commited this patch into trunk.
  
  Julian, you probably want to update the nvptx plugin.
 
 Note that as the number of P1s without posted fixes is now zero, it is
 likely RC1 will be done this week, so if you want nvptx working in
 GCC 5, please post a fix as soon as possible.

This version is mostly the same as the last posted version but has a
tweak in GOACC_parallel to account for the new splay tree arrangement
for target functions:

-  tgt_fn = (void (*)) tgt_fn_key-tgt-tgt_start;
+  tgt_fn = (void (*)) tgt_fn_key-tgt_offset;

Have there been any other changes I might have missed?

It passes libgomp testing on NVPTX. OK?

Thanks,

Juliancommit ac06b5e25e170061bb9855b9ea4b8e5696816bf1
Author: Julian Brown jul...@codesourcery.com
Date:   Tue Apr 7 09:23:58 2015 -0700

NVPTX load/unload and init-rework patch.

diff --git a/gcc/config/nvptx/mkoffload.c b/gcc/config/nvptx/mkoffload.c
index 02c44b6..dbc68bc 100644
--- a/gcc/config/nvptx/mkoffload.c
+++ b/gcc/config/nvptx/mkoffload.c
@@ -839,6 +839,7 @@ process (FILE *in, FILE *out)
 {
   const char *input = read_file (in);
   Token *tok = tokenize (input);
+  unsigned int nvars = 0, nfuncs = 0;
 
   do
 tok = parse_file (tok);
@@ -850,16 +851,17 @@ process (FILE *in, FILE *out)
   write_stmts (out, rev_stmts (fns));
   fprintf (out, ;\n\n);
   fprintf (out, static const char *var_mappings[] = {\n);
-  for (id_map *id = var_ids; id; id = id-next)
+  for (id_map *id = var_ids; id; id = id-next, nvars++)
 fprintf (out, \t\%s\%s\n, id-ptx_name, id-next ? , : );
   fprintf (out, };\n\n);
   fprintf (out, static const char *func_mappings[] = {\n);
-  for (id_map *id = func_ids; id; id = id-next)
+  for (id_map *id = func_ids; id; id = id-next, nfuncs++)
 fprintf (out, \t\%s\%s\n, id-ptx_name, id-next ? , : );
   fprintf (out, };\n\n);
 
   fprintf (out, static const void *target_data[] = {\n);
-  fprintf (out,   ptx_code, var_mappings, func_mappings\n);
+  fprintf (out,   ptx_code, (void*) %u, var_mappings, (void*) %u, 
+		func_mappings\n, nvars, nfuncs);
   fprintf (out, };\n\n);
 
   fprintf (out, extern void GOMP_offload_register (const void *, int, void *);\n);
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index a1d42c5..5272f01 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -655,9 +655,6 @@ struct target_mem_desc {
   /* Corresponding target device descriptor.  */
   struct gomp_device_descr *device_descr;
 
-  /* Memory mapping info for the thread that created this descriptor.  */
-  struct splay_tree_s *mem_map;
-
   /* List of splay keys to remove (or decrease refcount)
  at the end of region.  */
   splay_tree_key list[];
@@ -691,18 +688,6 @@ typedef struct acc_dispatch_t
   /* This is guarded by the lock in the outer struct gomp_device_descr.  */
   struct target_mem_desc *data_environ;
 
-  /* Extra information required for a device instance by a given target.  */
-  /* This is guarded by the lock in the outer struct gomp_device_descr.  */
-  void *target_data;
-
-  /* Open or close a device instance.  */
-  void *(*open_device_func) (int n);
-  int (*close_device_func) (void *h);
-
-  /* Set or get the device number.  */
-  int (*get_device_num_func) (void);
-  void (*set_device_num_func) (int);
-
   /* Execute.  */
   void (*exec_func) (void (*) (void *), size_t, void **, void **, size_t *,
 		 unsigned short *, int, int, int, int, void *);
@@ -720,7 +705,7 @@ typedef struct acc_dispatch_t
   void (*async_set_async_func) (int);
 
   /* Create/destroy TLS data.  */
-  void *(*create_thread_data_func) (void *);
+  void *(*create_thread_data_func) (int);
   void (*destroy_thread_data_func) (void *);
 
   /* NVIDIA target specific routines.  */
diff --git a/libgomp/oacc-async.c b/libgomp/oacc-async.c
index 08b7c5e..1f5827e 100644
--- a/libgomp/oacc-async.c
+++ b/libgomp/oacc-async.c
@@ -26,7 +26,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
http://www.gnu.org/licenses/.  */
 
-
+#include assert.h
 #include openacc.h
 #include libgomp.h
 #include oacc-int.h
@@ -37,13 +37,23 @@ acc_async_test (int async)
   if (async  acc_async_sync)
 gomp_fatal (invalid async argument: %d, async);
 
-  return base_dev-openacc.async_test_func (async);
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!thr || !thr-dev)
+gomp_fatal (no device active);
+
+  return thr-dev-openacc.async_test_func (async);
 }
 
 int
 acc_async_test_all (void)
 {
-  return base_dev-openacc.async_test_all_func ();
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!thr || !thr-dev)
+gomp_fatal (no device active);
+
+  return thr-dev-openacc.async_test_all_func ();
 }
 
 void
@@ -52,19 +62,34 @@ acc_wait (int async)
   if (async  acc_async_sync)

Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-04-08 Thread Ilya Verbin
On Wed, Apr 08, 2015 at 15:31:42 +0100, Julian Brown wrote:
 This version is mostly the same as the last posted version but has a
 tweak in GOACC_parallel to account for the new splay tree arrangement
 for target functions:
 
 -  tgt_fn = (void (*)) tgt_fn_key-tgt-tgt_start;
 +  tgt_fn = (void (*)) tgt_fn_key-tgt_offset;
 
 Have there been any other changes I might have missed?

No.

 It passes libgomp testing on NVPTX. OK?

Have you tested it with disabled offloading?

I see several regressions:
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_on_device-1.c 
-DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c 
-DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test

  -- Ilya


Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-04-08 Thread Jakub Jelinek
On Wed, Apr 08, 2015 at 03:31:42PM +0100, Julian Brown wrote:
 It passes libgomp testing on NVPTX. OK?

Please write a proper ChangeLog entry for it.
Ok with that.

Jakub


Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-04-08 Thread Julian Brown
On Wed, 8 Apr 2015 17:58:56 +0300
Ilya Verbin iver...@gmail.com wrote:

 Have you tested it with disabled offloading?
 
 I see several regressions:
 FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_on_device-1.c
 -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
 FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c
 -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test

No -- thanks for the note. I've committed the patch now, but I'll try
to get to looking at these in the next day or two (it's probably
something relatively minor, I guess).

Julian


Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-04-07 Thread Jakub Jelinek
On Mon, Apr 06, 2015 at 03:45:57PM +0300, Ilya Verbin wrote:
 On Wed, Apr 01, 2015 at 15:20:25 +0200, Jakub Jelinek wrote:
  LGTM with proper ChangeLog entry.
 
 I've commited this patch into trunk.
 
 Julian, you probably want to update the nvptx plugin.

Note that as the number of P1s without posted fixes is now zero, it is
likely RC1 will be done this week, so if you want nvptx working in GCC 5,
please post a fix as soon as possible.

Jakub


Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-04-06 Thread Ilya Verbin
On Wed, Apr 01, 2015 at 15:20:25 +0200, Jakub Jelinek wrote:
 LGTM with proper ChangeLog entry.

I've commited this patch into trunk.

Julian, you probably want to update the nvptx plugin.


gcc/
* config/i386/intelmic-mkoffload.c (generate_host_descr_file): Call
GOMP_offload_unregister from the destructor.
libgomp/
* libgomp-plugin.h (struct mapping_table): Replace with addr_pair.
* libgomp.h (struct gomp_memory_mapping): Remove.
(struct target_mem_desc): Change type of mem_map from
gomp_memory_mapping * to splay_tree_s *.
(struct gomp_device_descr): Remove register_image_func, get_table_func.
Add load_image_func, unload_image_func.
Change type of mem_map from gomp_memory_mapping to splay_tree_s.
Remove offload_regions_registered.
(gomp_init_tables): Remove.
(gomp_free_memmap): Change type of argument from gomp_memory_mapping *
to splay_tree_s *.
* libgomp.map (GOMP_4.0.1): Add GOMP_offload_unregister.
* oacc-host.c (host_dispatch): Do not initialize register_image_func,
get_table_func, mem_map.is_initialized, mem_map.splay_tree.root,
offload_regions_registered.
Initialize load_image_func, unload_image_func, mem_map.root.
(goacc_host_init): Do not initialize host_dispatch.mem_map.lock.
* oacc-init.c (lazy_open): Don't call gomp_init_tables.
(acc_shutdown_1): Use dev's lock and splay_tree instead of mem_map's.
* oacc-mem.c (lookup_host): Get gomp_device_descr *dev instead of
gomp_memory_mapping *.  Use dev's lock and splay_tree.
(lookup_dev): Use dev's lock.
(acc_deviceptr): Pass dev to lookup_host instead of mem_map.
(acc_is_present): Likewise.
(acc_map_data): Likewise.
(acc_unmap_data): Likewise.  Use dev's lock.
(present_create_copy): Likewise.
(delete_copyout): Pass dev to lookup_host instead of mem_map.
(update_dev_host): Likewise.
(gomp_acc_remove_pointer): Likewise.  Use dev's lock.
* oacc-parallel.c (GOACC_parallel): Use dev's lock and splay_tree.
* plugin/plugin-host.c (GOMP_OFFLOAD_register_image): Remove.
(GOMP_OFFLOAD_get_table): Remove
(GOMP_OFFLOAD_load_image): New function.
(GOMP_OFFLOAD_unload_image): New function.
* target.c (register_lock): New mutex for offload image registration.
(num_devices): Do not guard with PLUGIN_SUPPORT.
(gomp_realloc_unlock): New static function.
(gomp_map_vars_existing): Add device descriptor argument.  Unlock mutex
before gomp_fatal.
(gomp_map_vars): Use dev's lock and splay_tree instead of mem_map's.
Pass devicep to gomp_map_vars_existing.  Unlock mutex before gomp_fatal.
(gomp_copy_from_async): Use dev's lock and splay_tree instead of
mem_map's.
(gomp_unmap_vars): Likewise.
(gomp_update): Remove gomp_memory_mapping argument.  Use dev's lock and
splay_tree instead of mm's.  Unlock mutex before gomp_fatal.
(gomp_offload_image_to_device): New static function.
(GOMP_offload_register): Add mutex lock.
Call gomp_offload_image_to_device for all initialized devices.
Replace gomp_realloc with gomp_realloc_unlock.
(GOMP_offload_unregister): New function.
(gomp_init_tables): Replace with gomp_init_device.  Replace a call to
get_table_func from the plugin with calls to init_device_func and
gomp_offload_image_to_device.
(gomp_free_memmap): Change type of argument from gomp_memory_mapping *
to splay_tree_s *.
(GOMP_target): Do not call gomp_init_tables.  Use dev's lock and
splay_tree instead of mem_map's.  Unlock mutex before gomp_fatal.
(GOMP_target_data): Do not call gomp_init_tables.
(GOMP_target_update): Likewise.  Remove argument from gomp_update.
(gomp_load_plugin_for_device): Replace register_image and get_table
with load_image and unload_image in DLSYM ().
(gomp_register_images_for_device): Remove function.
(gomp_target_init): Do not initialize current_device.mem_map.*,
current_device.offload_regions_registered.
Remove call to gomp_register_images_for_device.
Do not free offload_images and num_offload_images.
liboffloadmic/
* plugin/libgomp-plugin-intelmic.cpp: Include map.
(AddrVect, DevAddrVect, ImgDevAddrMap): New typedefs.
(num_devices, num_images, address_table): New static vars.
(num_libraries, lib_descrs): Remove static vars.
(set_mic_lib_path): Rename to ...
(init): ... this.  Allocate address_table and get num_devices.
(GOMP_OFFLOAD_get_num_devices): return num_devices.
(load_lib_and_get_table): Remove static function.
(offload_image): New static function.
(GOMP_OFFLOAD_get_table): Remove function.

Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-04-01 Thread Ilya Verbin
On Wed, Apr 01, 2015 at 07:21:47 +0200, Jakub Jelinek wrote:
 On Wed, Apr 01, 2015 at 02:53:28AM +0300, Ilya Verbin wrote:
  +/* Similar to gomp_fatal, but release mutexes before.  */
  +
  +static void
  +gomp_fatal_unlock (const char *fmt, ...)
  +{
  +  int i;
  +  va_list list;
  +
  +  for (i = 0; i  num_devices; i++)
  +gomp_mutex_unlock (devices[i].lock);
 
 This is wrong.  Calling gomp_mutex_unlock on a lock that you don't have
 locked is undefined behavior.
 You really should unlock it in the caller which should be aware which 0/1/2
 locks it holds.

I was worried about the following scenario:
1. Thread 1 in GOMP_target locks device 1.
2. Thread 2 in GOMP_target locks device 2 and calls gomp_fatal.
3. GOMP_offload_unregister will wait for device 1, even device 2 is unlocked.
Anyway, it was a bad idea to unlock mutexes from non-owner thread.

Here is patch, which unlocks proper mutexes in the caller, as you suggested.
make check-target-libgomp passed.


diff --git a/gcc/config/i386/intelmic-mkoffload.c 
b/gcc/config/i386/intelmic-mkoffload.c
index f93007c..e101f93 100644
--- a/gcc/config/i386/intelmic-mkoffload.c
+++ b/gcc/config/i386/intelmic-mkoffload.c
@@ -350,14 +350,24 @@ generate_host_descr_file (const char *host_compiler)
   #ifdef __cplusplus\n
   extern \C\\n
   #endif\n
-  void GOMP_offload_register (void *, int, void *);\n\n
+  void GOMP_offload_register (void *, int, void *);\n
+  void GOMP_offload_unregister (void *, int, void *);\n\n
 
   __attribute__((constructor))\n
   static void\n
   init (void)\n
   {\n
 GOMP_offload_register (__OFFLOAD_TABLE__, %d, 
__offload_target_data);\n
+  }\n\n, GOMP_DEVICE_INTEL_MIC);
+
+  fprintf (src_file,
+  __attribute__((destructor))\n
+  static void\n
+  fini (void)\n
+  {\n
+GOMP_offload_unregister (__OFFLOAD_TABLE__, %d, 
__offload_target_data);\n
   }\n, GOMP_DEVICE_INTEL_MIC);
+
   fclose (src_file);
 
   unsigned new_argc = 0;
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index d9cbff5..1072ae4 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -51,14 +51,12 @@ enum offload_target_type
   OFFLOAD_TARGET_TYPE_INTEL_MIC = 6
 };
 
-/* Auxiliary struct, used for transferring a host-target address range mapping
-   from plugin to libgomp.  */
-struct mapping_table
+/* Auxiliary struct, used for transferring pairs of addresses from plugin
+   to libgomp.  */
+struct addr_pair
 {
-  uintptr_t host_start;
-  uintptr_t host_end;
-  uintptr_t tgt_start;
-  uintptr_t tgt_end;
+  uintptr_t start;
+  uintptr_t end;
 };
 
 /* Miscellaneous functions.  */
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 3089401..a1d42c5 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -224,7 +224,6 @@ struct gomp_team_state
 };
 
 struct target_mem_desc;
-struct gomp_memory_mapping;
 
 /* These are the OpenMP 4.0 Internal Control Variables described in
section 2.3.1.  Those described as having one copy per task are
@@ -657,7 +656,7 @@ struct target_mem_desc {
   struct gomp_device_descr *device_descr;
 
   /* Memory mapping info for the thread that created this descriptor.  */
-  struct gomp_memory_mapping *mem_map;
+  struct splay_tree_s *mem_map;
 
   /* List of splay keys to remove (or decrease refcount)
  at the end of region.  */
@@ -683,20 +682,6 @@ struct splay_tree_key_s {
 
 #include splay-tree.h
 
-/* Information about mapped memory regions (per device/context).  */
-
-struct gomp_memory_mapping
-{
-  /* Mutex for operating with the splay tree and other shared structures.  */
-  gomp_mutex_t lock;
-
-  /* True when tables have been added to this memory map.  */
-  bool is_initialized;
-
-  /* Splay tree containing information about mapped memory regions.  */
-  struct splay_tree_s splay_tree;
-};
-
 typedef struct acc_dispatch_t
 {
   /* This is a linked list of data mapped using the
@@ -773,19 +758,18 @@ struct gomp_device_descr
   unsigned int (*get_caps_func) (void);
   int (*get_type_func) (void);
   int (*get_num_devices_func) (void);
-  void (*register_image_func) (void *, void *);
   void (*init_device_func) (int);
   void (*fini_device_func) (int);
-  int (*get_table_func) (int, struct mapping_table **);
+  int (*load_image_func) (int, void *, struct addr_pair **);
+  void (*unload_image_func) (int, void *);
   void *(*alloc_func) (int, size_t);
   void (*free_func) (int, void *);
   void *(*dev2host_func) (int, void *, const void *, size_t);
   void *(*host2dev_func) (int, void *, const void *, size_t);
   void (*run_func) (int, void *, void *);
 
-  /* Memory-mapping info for this device instance.  */
-  /* Uses a separate lock.  */
-  struct gomp_memory_mapping mem_map;
+  /* Splay tree containing information about mapped memory regions.  */
+  struct splay_tree_s mem_map;
 
   /* Mutex for the mutable data.  */
   gomp_mutex_t lock;
@@ 

Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-04-01 Thread Jakub Jelinek
On Wed, Apr 01, 2015 at 04:14:05PM +0300, Ilya Verbin wrote:
 On Wed, Apr 01, 2015 at 07:21:47 +0200, Jakub Jelinek wrote:
  On Wed, Apr 01, 2015 at 02:53:28AM +0300, Ilya Verbin wrote:
   +/* Similar to gomp_fatal, but release mutexes before.  */
   +
   +static void
   +gomp_fatal_unlock (const char *fmt, ...)
   +{
   +  int i;
   +  va_list list;
   +
   +  for (i = 0; i  num_devices; i++)
   +gomp_mutex_unlock (devices[i].lock);
  
  This is wrong.  Calling gomp_mutex_unlock on a lock that you don't have
  locked is undefined behavior.
  You really should unlock it in the caller which should be aware which 0/1/2
  locks it holds.
 
 I was worried about the following scenario:
 1. Thread 1 in GOMP_target locks device 1.
 2. Thread 2 in GOMP_target locks device 2 and calls gomp_fatal.
 3. GOMP_offload_unregister will wait for device 1, even device 2 is unlocked.

How is that different from
1. Thread 1 in GOMP_target locks device 1.
2. Thread 2 calls exit.
?  I mean when you unlock the device and register locks if you own them
before gomp_fatal.

 Anyway, it was a bad idea to unlock mutexes from non-owner thread.
 
 Here is patch, which unlocks proper mutexes in the caller, as you suggested.
 make check-target-libgomp passed.

LGTM with proper ChangeLog entry.

Jakub


Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-04-01 Thread Ilya Verbin
On Wed, Apr 01, 2015 at 15:20:25 +0200, Jakub Jelinek wrote:
 On Wed, Apr 01, 2015 at 04:14:05PM +0300, Ilya Verbin wrote:
  I was worried about the following scenario:
  1. Thread 1 in GOMP_target locks device 1.
  2. Thread 2 in GOMP_target locks device 2 and calls gomp_fatal.
  3. GOMP_offload_unregister will wait for device 1, even device 2 is 
  unlocked.
 
 How is that different from
 1. Thread 1 in GOMP_target locks device 1.
 2. Thread 2 calls exit.
 ?  I mean when you unlock the device and register locks if you own them
 before gomp_fatal.

Yeah, it's the same situation.

  Here is patch, which unlocks proper mutexes in the caller, as you suggested.
  make check-target-libgomp passed.
 
 LGTM with proper ChangeLog entry.

When should I commit it into trunk?  Without the corresponding PTX part,
offloading to PTX will not work.

  -- Ilya


Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-03-31 Thread Ilya Verbin
On Mon, Mar 30, 2015 at 22:42:51 +0100, Julian Brown wrote:
 On Mon, 30 Mar 2015 18:42:02 +0200
 Jakub Jelinek ja...@redhat.com wrote:
  But the one Julian posted doesn't apply on top of your patch.
  If there is any interdiff needed on top of your patch, can it be
  posted against trunk + your patch?
 
 Here's a version of my patch against trunk and Ilya's latest patch
 (hopefully!). Tests look OK (libgomp + PTX).

Thanks for rebasing!

On Mon, Mar 30, 2015 at 18:42:02 +0200, Jakub Jelinek wrote:
  +/* Insert mapping of host - target address pairs to splay tree.  */
  +
  +static void
  +gomp_splay_tree_insert_mapping (struct gomp_device_descr *devicep,
  +   struct addr_pair *host_addr,
  +   struct addr_pair *tgt_addr)
  +{
  +  struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
  +  tgt-refcount = 1;
  +  tgt-array = gomp_malloc (sizeof (*tgt-array));
  +  tgt-tgt_start = tgt_addr-start;
  +  tgt-tgt_end = tgt_addr-end;
  +  tgt-to_free = NULL;
  +  tgt-list_count = 0;
  +  tgt-device_descr = devicep;
  +  splay_tree_node node = tgt-array;
  +  splay_tree_key k = node-key;
  +  k-host_start = host_addr-start;
  +  k-host_end = host_addr-end;
  +  k-tgt_offset = 0;
  +  k-refcount = 1;
  +  k-copy_from = false;
  +  k-tgt = tgt;
  +  node-left = NULL;
  +  node-right = NULL;
  +  splay_tree_insert (devicep-mem_map, node);
  +}
 
 What is the reason to register and allocate these one at a time, rather than
 using one struct target_mem_desc with one tgt-array for all splay tree
 nodes registered from one image?
 Perhaps you would just use tgt_start of 0 and tgt_end of 0 too (to make it
 clear it is special) and just use tgt_offset relative to that (i.e.
 absolute), but having to malloc each node individually and having to malloc
 a target_mem_desc for each one sounds expensive.
 Everything is freed just once anyway, isn't it?

Here is WIP patch, does this look like what you suggested?  It works fine with
functions, however I'm not sure what to do with variables.  Will gomp_map_vars
work when tgt_start and tgt_end are equal to 0?

  @@ -654,6 +727,18 @@ void
   GOMP_offload_register (void *host_table, enum offload_target_type 
  target_type,
 void *target_data)
   {
  +  int i;
  +  gomp_mutex_lock (register_lock);
  +
  +  /* Load image to all initialized devices.  */
  +  for (i = 0; i  num_devices; i++)
  +{
  +  struct gomp_device_descr *devicep = devices[i];
  +  if (devicep-type == target_type  devicep-is_initialized)
  +   gomp_offload_image_to_device (devicep, host_table, target_data);
 
 Shouldn't either this function, or gomp_offload_image_to_device lock
 also devicep-lock mutex and unlock at the end?
 Where exactly I guess depends on if the devicep-* hook calls should be
 guarded with the mutex or not.  If yes, it should be this function and
 gomp_init_device.
 
  +  if (devicep-type != target_type || !devicep-is_initialized)
  +   continue;
  +
 
 Similarly.

I've added lock/unlock to GOMP_offload_register and GOMP_offload_unregister.
All calls to gomp_init_device were already guarded.

  +  devicep-unload_image_func (devicep-target_id, target_data);
  +
  +  /* Remove mapping from splay tree.  */
  +  for (j = 0; j  num_funcs; j++)
  +   {
  + struct splay_tree_key_s k;
  + k.host_start = (uintptr_t) host_func_table[j];
  + k.host_end = k.host_start + 1;
  + splay_tree_remove (devicep-mem_map, k);
  +   }
  +
  +  for (j = 0; j  num_vars; j++)
  +   {
  + struct splay_tree_key_s k;
  + k.host_start = (uintptr_t) host_var_table[j*2];
  + k.host_end = k.host_start + (uintptr_t) host_var_table[j*2+1];
  + splay_tree_remove (devicep-mem_map, k);
  +   }
  +}
 
 Aren't you leaking here all the tgt-array and tgt allocations here?
 Though, if you change it to just two allocations (one tgt, one array),
 you'd need to free just once.

You're right.  I've fixed this for functions, variables are WIP.


diff --git a/gcc/config/i386/intelmic-mkoffload.c 
b/gcc/config/i386/intelmic-mkoffload.c
index f93007c..e101f93 100644
--- a/gcc/config/i386/intelmic-mkoffload.c
+++ b/gcc/config/i386/intelmic-mkoffload.c
@@ -350,14 +350,24 @@ generate_host_descr_file (const char *host_compiler)
   #ifdef __cplusplus\n
   extern \C\\n
   #endif\n
-  void GOMP_offload_register (void *, int, void *);\n\n
+  void GOMP_offload_register (void *, int, void *);\n
+  void GOMP_offload_unregister (void *, int, void *);\n\n
 
   __attribute__((constructor))\n
   static void\n
   init (void)\n
   {\n
 GOMP_offload_register (__OFFLOAD_TABLE__, %d, 
__offload_target_data);\n
+  }\n\n, GOMP_DEVICE_INTEL_MIC);
+
+  fprintf (src_file,
+  __attribute__((destructor))\n
+  static void\n
+  fini (void)\n
+  {\n
+GOMP_offload_unregister (__OFFLOAD_TABLE__, 

Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-03-31 Thread Jakub Jelinek
On Tue, Mar 31, 2015 at 03:52:06PM +0300, Ilya Verbin wrote:
  What is the reason to register and allocate these one at a time, rather than
  using one struct target_mem_desc with one tgt-array for all splay tree
  nodes registered from one image?
  Perhaps you would just use tgt_start of 0 and tgt_end of 0 too (to make it
  clear it is special) and just use tgt_offset relative to that (i.e.
  absolute), but having to malloc each node individually and having to malloc
  a target_mem_desc for each one sounds expensive.
  Everything is freed just once anyway, isn't it?
 
 Here is WIP patch, does this look like what you suggested?  It works fine with
 functions, however I'm not sure what to do with variables.  Will gomp_map_vars
 work when tgt_start and tgt_end are equal to 0?

Can you explain what you are afraid of?  The mapped images (both their
mapping and unmapping) are done in pairs, and in a valid program the
addresses shouldn't be already mapped when the image is mapped in etc.
So, for gomp_map_vars, the var allocations should just be the pre-existing
mappings, i.e.
  splay_tree_key n = splay_tree_lookup (mm-splay_tree, cur_node);
  if (n)
{
  tgt-list[i] = n;
  gomp_map_vars_existing (n, cur_node, kind  typemask);
}
case and
  if (is_target)
{
  for (i = 0; i  mapnum; i++)
{
  if (tgt-list[i] == NULL)
cur_node.tgt_offset = (uintptr_t) NULL;
  else
cur_node.tgt_offset = tgt-list[i]-tgt-tgt_start
  + tgt-list[i]-tgt_offset;
  /* FIXME: see above FIXME comment.  */
  devicep-host2dev_func (devicep-target_id,
  (void *) (tgt-tgt_start
+ i * sizeof (void *)),
  (void *) cur_node.tgt_offset,
  sizeof (void *));
}
}
at the end.  tgt-list[i] will be non-NULL, tgt-list[i]-tgt-tgt_start
will be 0, but tgt-list[i]-tgt_offset will be absolute and so should DTRT.

 +  for (i = 0; i  num_vars; i++)
 +{
 +  struct addr_pair host_addr;
 +  host_addr.start = (uintptr_t) host_var_table[i*2];
 +  host_addr.end = host_addr.start + (uintptr_t) host_var_table[i*2+1];

Formatting, spaces around + or *.  But, as said earlier, I don't see why
this wouldn't work for variables too.

Jakub


Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-03-31 Thread Jakub Jelinek
On Wed, Apr 01, 2015 at 02:53:28AM +0300, Ilya Verbin wrote:
 +/* Similar to gomp_fatal, but release mutexes before.  */
 +
 +static void
 +gomp_fatal_unlock (const char *fmt, ...)
 +{
 +  int i;
 +  va_list list;
 +
 +  for (i = 0; i  num_devices; i++)
 +gomp_mutex_unlock (devices[i].lock);

This is wrong.  Calling gomp_mutex_unlock on a lock that you don't have
locked is undefined behavior.
You really should unlock it in the caller which should be aware which 0/1/2
locks it holds.

 +  gomp_mutex_unlock (register_lock);

Jakub


Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-03-31 Thread Ilya Verbin
On Tue, Mar 31, 2015 at 15:07:58 +0200, Jakub Jelinek wrote:
 On Tue, Mar 31, 2015 at 03:52:06PM +0300, Ilya Verbin wrote:
   What is the reason to register and allocate these one at a time, rather 
   than
   using one struct target_mem_desc with one tgt-array for all splay tree
   nodes registered from one image?
   Perhaps you would just use tgt_start of 0 and tgt_end of 0 too (to make it
   clear it is special) and just use tgt_offset relative to that (i.e.
   absolute), but having to malloc each node individually and having to 
   malloc
   a target_mem_desc for each one sounds expensive.
   Everything is freed just once anyway, isn't it?
  
  Here is WIP patch, does this look like what you suggested?  It works fine 
  with
  functions, however I'm not sure what to do with variables.  Will 
  gomp_map_vars
  work when tgt_start and tgt_end are equal to 0?
 
 Can you explain what you are afraid of?  The mapped images (both their
 mapping and unmapping) are done in pairs, and in a valid program the
 addresses shouldn't be already mapped when the image is mapped in etc.
 So, for gomp_map_vars, the var allocations should just be the pre-existing
 mappings, i.e.
   splay_tree_key n = splay_tree_lookup (mm-splay_tree, cur_node);
   if (n)
 {
   tgt-list[i] = n;
   gomp_map_vars_existing (n, cur_node, kind  typemask);
 }
 case and
   if (is_target)
 {
   for (i = 0; i  mapnum; i++)
 {
   if (tgt-list[i] == NULL)
 cur_node.tgt_offset = (uintptr_t) NULL;
   else
 cur_node.tgt_offset = tgt-list[i]-tgt-tgt_start
   + tgt-list[i]-tgt_offset;
   /* FIXME: see above FIXME comment.  */
   devicep-host2dev_func (devicep-target_id,
   (void *) (tgt-tgt_start
 + i * sizeof (void *)),
   (void *) cur_node.tgt_offset,
   sizeof (void *));
 }
 }
 at the end.  tgt-list[i] will be non-NULL, tgt-list[i]-tgt-tgt_start
 will be 0, but tgt-list[i]-tgt_offset will be absolute and so should DTRT.

Ok, thanks for the clarification!  Here is the new patch with variables.

Unfortunately I see 4 fails in make check-target-libgomp with PTX patch applied
on top, but with disabled offloading to PTX.
Julian, have you seen them?  All other tests passed with intelmic emul.

FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_on_device-1.c 
-DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c 
-DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/acc_on_device-1.c 
-DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/if-1.c 
-DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test

acc_on_device-1.c aborts here:
  /* Offloaded.  */
#pragma acc parallel
  {
if (acc_on_device (acc_device_none))
  abort ();


diff --git a/gcc/config/i386/intelmic-mkoffload.c 
b/gcc/config/i386/intelmic-mkoffload.c
index f93007c..e101f93 100644
--- a/gcc/config/i386/intelmic-mkoffload.c
+++ b/gcc/config/i386/intelmic-mkoffload.c
@@ -350,14 +350,24 @@ generate_host_descr_file (const char *host_compiler)
   #ifdef __cplusplus\n
   extern \C\\n
   #endif\n
-  void GOMP_offload_register (void *, int, void *);\n\n
+  void GOMP_offload_register (void *, int, void *);\n
+  void GOMP_offload_unregister (void *, int, void *);\n\n
 
   __attribute__((constructor))\n
   static void\n
   init (void)\n
   {\n
 GOMP_offload_register (__OFFLOAD_TABLE__, %d, 
__offload_target_data);\n
+  }\n\n, GOMP_DEVICE_INTEL_MIC);
+
+  fprintf (src_file,
+  __attribute__((destructor))\n
+  static void\n
+  fini (void)\n
+  {\n
+GOMP_offload_unregister (__OFFLOAD_TABLE__, %d, 
__offload_target_data);\n
   }\n, GOMP_DEVICE_INTEL_MIC);
+
   fclose (src_file);
 
   unsigned new_argc = 0;
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index d9cbff5..1072ae4 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -51,14 +51,12 @@ enum offload_target_type
   OFFLOAD_TARGET_TYPE_INTEL_MIC = 6
 };
 
-/* Auxiliary struct, used for transferring a host-target address range mapping
-   from plugin to libgomp.  */
-struct mapping_table
+/* Auxiliary struct, used for transferring pairs of addresses from plugin
+   to libgomp.  */
+struct addr_pair
 {
-  uintptr_t host_start;
-  uintptr_t host_end;
-  uintptr_t tgt_start;
-  uintptr_t tgt_end;
+  uintptr_t start;
+  uintptr_t end;
 };
 
 /* Miscellaneous functions.  */
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 3089401..a1d42c5 100644
--- 

Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-03-31 Thread Jakub Jelinek
On Tue, Mar 31, 2015 at 09:25:26PM +0300, Ilya Verbin wrote:
 On Mon, Mar 30, 2015 at 18:42:02 +0200, Jakub Jelinek wrote:
  Shouldn't either this function, or gomp_offload_image_to_device lock
  also devicep-lock mutex and unlock at the end?
  Where exactly I guess depends on if the devicep-* hook calls should be
  guarded with the mutex or not.  If yes, it should be this function and
  gomp_init_device.
  
   +  if (devicep-type != target_type || !devicep-is_initialized)
   + continue;
   +
  
  Similarly.
 
 Oops, there is a deadlock.  E.g. if gomp_map_vars locks devicep-lock and then
 calls gomp_fatal, the destructors from .fini section are executed, so
 gomp_mutex_lock in GOMP_offload_unregister will wait for devicep-lock.

Thus perhaps before calling gomp_fatal you should release the device lock
(if held) and register_lock (ditto).

Jakub


Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-03-31 Thread Ilya Verbin
On Mon, Mar 30, 2015 at 18:42:02 +0200, Jakub Jelinek wrote:
 Shouldn't either this function, or gomp_offload_image_to_device lock
 also devicep-lock mutex and unlock at the end?
 Where exactly I guess depends on if the devicep-* hook calls should be
 guarded with the mutex or not.  If yes, it should be this function and
 gomp_init_device.
 
  +  if (devicep-type != target_type || !devicep-is_initialized)
  +   continue;
  +
 
 Similarly.

Oops, there is a deadlock.  E.g. if gomp_map_vars locks devicep-lock and then
calls gomp_fatal, the destructors from .fini section are executed, so
gomp_mutex_lock in GOMP_offload_unregister will wait for devicep-lock.

  -- Ilya


Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-03-31 Thread Ilya Verbin
On Tue, Mar 31, 2015 at 19:10:36 +0300, Ilya Verbin wrote:
 Ok, thanks for the clarification!  Here is the new patch with variables.
 
 Unfortunately I see 4 fails in make check-target-libgomp with PTX patch 
 applied
 on top, but with disabled offloading to PTX.
 Julian, have you seen them?  All other tests passed with intelmic emul.
 
 FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_on_device-1.c 
 -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
 FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c 
 -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
 FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/acc_on_device-1.c 
 -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
 FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/if-1.c 
 -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
 
 acc_on_device-1.c aborts here:
   /* Offloaded.  */
 #pragma acc parallel
   {
 if (acc_on_device (acc_device_none))
   abort ();

And here is the next version with fixed potential deadlock in
GOMP_offload_unregister.  make check-target-libgomp also passed.
(but with PTX patch make check-target-libgomp has several fails mentioned above)


diff --git a/gcc/config/i386/intelmic-mkoffload.c 
b/gcc/config/i386/intelmic-mkoffload.c
index f93007c..e101f93 100644
--- a/gcc/config/i386/intelmic-mkoffload.c
+++ b/gcc/config/i386/intelmic-mkoffload.c
@@ -350,14 +350,24 @@ generate_host_descr_file (const char *host_compiler)
   #ifdef __cplusplus\n
   extern \C\\n
   #endif\n
-  void GOMP_offload_register (void *, int, void *);\n\n
+  void GOMP_offload_register (void *, int, void *);\n
+  void GOMP_offload_unregister (void *, int, void *);\n\n
 
   __attribute__((constructor))\n
   static void\n
   init (void)\n
   {\n
 GOMP_offload_register (__OFFLOAD_TABLE__, %d, 
__offload_target_data);\n
+  }\n\n, GOMP_DEVICE_INTEL_MIC);
+
+  fprintf (src_file,
+  __attribute__((destructor))\n
+  static void\n
+  fini (void)\n
+  {\n
+GOMP_offload_unregister (__OFFLOAD_TABLE__, %d, 
__offload_target_data);\n
   }\n, GOMP_DEVICE_INTEL_MIC);
+
   fclose (src_file);
 
   unsigned new_argc = 0;
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index d9cbff5..1072ae4 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -51,14 +51,12 @@ enum offload_target_type
   OFFLOAD_TARGET_TYPE_INTEL_MIC = 6
 };
 
-/* Auxiliary struct, used for transferring a host-target address range mapping
-   from plugin to libgomp.  */
-struct mapping_table
+/* Auxiliary struct, used for transferring pairs of addresses from plugin
+   to libgomp.  */
+struct addr_pair
 {
-  uintptr_t host_start;
-  uintptr_t host_end;
-  uintptr_t tgt_start;
-  uintptr_t tgt_end;
+  uintptr_t start;
+  uintptr_t end;
 };
 
 /* Miscellaneous functions.  */
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 3089401..a1d42c5 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -224,7 +224,6 @@ struct gomp_team_state
 };
 
 struct target_mem_desc;
-struct gomp_memory_mapping;
 
 /* These are the OpenMP 4.0 Internal Control Variables described in
section 2.3.1.  Those described as having one copy per task are
@@ -657,7 +656,7 @@ struct target_mem_desc {
   struct gomp_device_descr *device_descr;
 
   /* Memory mapping info for the thread that created this descriptor.  */
-  struct gomp_memory_mapping *mem_map;
+  struct splay_tree_s *mem_map;
 
   /* List of splay keys to remove (or decrease refcount)
  at the end of region.  */
@@ -683,20 +682,6 @@ struct splay_tree_key_s {
 
 #include splay-tree.h
 
-/* Information about mapped memory regions (per device/context).  */
-
-struct gomp_memory_mapping
-{
-  /* Mutex for operating with the splay tree and other shared structures.  */
-  gomp_mutex_t lock;
-
-  /* True when tables have been added to this memory map.  */
-  bool is_initialized;
-
-  /* Splay tree containing information about mapped memory regions.  */
-  struct splay_tree_s splay_tree;
-};
-
 typedef struct acc_dispatch_t
 {
   /* This is a linked list of data mapped using the
@@ -773,19 +758,18 @@ struct gomp_device_descr
   unsigned int (*get_caps_func) (void);
   int (*get_type_func) (void);
   int (*get_num_devices_func) (void);
-  void (*register_image_func) (void *, void *);
   void (*init_device_func) (int);
   void (*fini_device_func) (int);
-  int (*get_table_func) (int, struct mapping_table **);
+  int (*load_image_func) (int, void *, struct addr_pair **);
+  void (*unload_image_func) (int, void *);
   void *(*alloc_func) (int, size_t);
   void (*free_func) (int, void *);
   void *(*dev2host_func) (int, void *, const void *, size_t);
   void *(*host2dev_func) (int, void *, const void *, size_t);
   void (*run_func) (int, void *, void *);
 
-  /* Memory-mapping info for this device instance.  */

Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-03-30 Thread Jakub Jelinek
On Thu, Mar 26, 2015 at 11:41:30PM +0300, Ilya Verbin wrote:
 Here is the latest patch for libgomp and mic plugin.
 make check-target-libgomp using intelmic emul passed.
 Also I used a testcase from the attachment.

This applies cleanly.

 Latest ptx part is here, I guess:
 https://gcc.gnu.org/ml/gcc-patches/2015-02/msg01407.html

But the one Julian posted doesn't apply on top of your patch.
If there is any interdiff needed on top of your patch, can it be
posted against trunk + your patch?

 +/* Insert mapping of host - target address pairs to splay tree.  */
 +
 +static void
 +gomp_splay_tree_insert_mapping (struct gomp_device_descr *devicep,
 + struct addr_pair *host_addr,
 + struct addr_pair *tgt_addr)
 +{
 +  struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
 +  tgt-refcount = 1;
 +  tgt-array = gomp_malloc (sizeof (*tgt-array));
 +  tgt-tgt_start = tgt_addr-start;
 +  tgt-tgt_end = tgt_addr-end;
 +  tgt-to_free = NULL;
 +  tgt-list_count = 0;
 +  tgt-device_descr = devicep;
 +  splay_tree_node node = tgt-array;
 +  splay_tree_key k = node-key;
 +  k-host_start = host_addr-start;
 +  k-host_end = host_addr-end;
 +  k-tgt_offset = 0;
 +  k-refcount = 1;
 +  k-copy_from = false;
 +  k-tgt = tgt;
 +  node-left = NULL;
 +  node-right = NULL;
 +  splay_tree_insert (devicep-mem_map, node);
 +}

What is the reason to register and allocate these one at a time, rather than
using one struct target_mem_desc with one tgt-array for all splay tree
nodes registered from one image?
Perhaps you would just use tgt_start of 0 and tgt_end of 0 too (to make it
clear it is special) and just use tgt_offset relative to that (i.e.
absolute), but having to malloc each node individually and having to malloc
a target_mem_desc for each one sounds expensive.
Everything is freed just once anyway, isn't it?

 @@ -654,6 +727,18 @@ void
  GOMP_offload_register (void *host_table, enum offload_target_type 
 target_type,
  void *target_data)
  {
 +  int i;
 +  gomp_mutex_lock (register_lock);
 +
 +  /* Load image to all initialized devices.  */
 +  for (i = 0; i  num_devices; i++)
 +{
 +  struct gomp_device_descr *devicep = devices[i];
 +  if (devicep-type == target_type  devicep-is_initialized)
 + gomp_offload_image_to_device (devicep, host_table, target_data);

Shouldn't either this function, or gomp_offload_image_to_device lock
also devicep-lock mutex and unlock at the end?
Where exactly I guess depends on if the devicep-* hook calls should be
guarded with the mutex or not.  If yes, it should be this function and
gomp_init_device.

 +  if (devicep-type != target_type || !devicep-is_initialized)
 + continue;
 +

Similarly.

 +  devicep-unload_image_func (devicep-target_id, target_data);
 +
 +  /* Remove mapping from splay tree.  */
 +  for (j = 0; j  num_funcs; j++)
 + {
 +   struct splay_tree_key_s k;
 +   k.host_start = (uintptr_t) host_func_table[j];
 +   k.host_end = k.host_start + 1;
 +   splay_tree_remove (devicep-mem_map, k);
 + }
 +
 +  for (j = 0; j  num_vars; j++)
 + {
 +   struct splay_tree_key_s k;
 +   k.host_start = (uintptr_t) host_var_table[j*2];
 +   k.host_end = k.host_start + (uintptr_t) host_var_table[j*2+1];
 +   splay_tree_remove (devicep-mem_map, k);
 + }
 +}

Aren't you leaking here all the tgt-array and tgt allocations here?
Though, if you change it to just two allocations (one tgt, one array),
you'd need to free just once.

Jakub


Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-03-30 Thread Julian Brown
On Mon, 30 Mar 2015 18:42:02 +0200
Jakub Jelinek ja...@redhat.com wrote:

 On Thu, Mar 26, 2015 at 11:41:30PM +0300, Ilya Verbin wrote:
  Here is the latest patch for libgomp and mic plugin.
  make check-target-libgomp using intelmic emul passed.
  Also I used a testcase from the attachment.
 
 This applies cleanly.
 
  Latest ptx part is here, I guess:
  https://gcc.gnu.org/ml/gcc-patches/2015-02/msg01407.html
 
 But the one Julian posted doesn't apply on top of your patch.
 If there is any interdiff needed on top of your patch, can it be
 posted against trunk + your patch?

Here's a version of my patch against trunk and Ilya's latest patch
(hopefully!). Tests look OK (libgomp + PTX).

HTH,

Juliancommit f203634ace786b5bb2fdce56f123f3fba236dda3
Author: Julian Brown jul...@codesourcery.com
Date:   Mon Mar 30 14:37:53 2015 -0700

nvptx load/unload support, init rework

diff --git a/gcc/config/nvptx/mkoffload.c b/gcc/config/nvptx/mkoffload.c
index 02c44b6..dbc68bc 100644
--- a/gcc/config/nvptx/mkoffload.c
+++ b/gcc/config/nvptx/mkoffload.c
@@ -839,6 +839,7 @@ process (FILE *in, FILE *out)
 {
   const char *input = read_file (in);
   Token *tok = tokenize (input);
+  unsigned int nvars = 0, nfuncs = 0;
 
   do
 tok = parse_file (tok);
@@ -850,16 +851,17 @@ process (FILE *in, FILE *out)
   write_stmts (out, rev_stmts (fns));
   fprintf (out, ;\n\n);
   fprintf (out, static const char *var_mappings[] = {\n);
-  for (id_map *id = var_ids; id; id = id-next)
+  for (id_map *id = var_ids; id; id = id-next, nvars++)
 fprintf (out, \t\%s\%s\n, id-ptx_name, id-next ? , : );
   fprintf (out, };\n\n);
   fprintf (out, static const char *func_mappings[] = {\n);
-  for (id_map *id = func_ids; id; id = id-next)
+  for (id_map *id = func_ids; id; id = id-next, nfuncs++)
 fprintf (out, \t\%s\%s\n, id-ptx_name, id-next ? , : );
   fprintf (out, };\n\n);
 
   fprintf (out, static const void *target_data[] = {\n);
-  fprintf (out,   ptx_code, var_mappings, func_mappings\n);
+  fprintf (out,   ptx_code, (void*) %u, var_mappings, (void*) %u, 
+		func_mappings\n, nvars, nfuncs);
   fprintf (out, };\n\n);
 
   fprintf (out, extern void GOMP_offload_register (const void *, int, void *);\n);
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index a1d42c5..5272f01 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -655,9 +655,6 @@ struct target_mem_desc {
   /* Corresponding target device descriptor.  */
   struct gomp_device_descr *device_descr;
 
-  /* Memory mapping info for the thread that created this descriptor.  */
-  struct splay_tree_s *mem_map;
-
   /* List of splay keys to remove (or decrease refcount)
  at the end of region.  */
   splay_tree_key list[];
@@ -691,18 +688,6 @@ typedef struct acc_dispatch_t
   /* This is guarded by the lock in the outer struct gomp_device_descr.  */
   struct target_mem_desc *data_environ;
 
-  /* Extra information required for a device instance by a given target.  */
-  /* This is guarded by the lock in the outer struct gomp_device_descr.  */
-  void *target_data;
-
-  /* Open or close a device instance.  */
-  void *(*open_device_func) (int n);
-  int (*close_device_func) (void *h);
-
-  /* Set or get the device number.  */
-  int (*get_device_num_func) (void);
-  void (*set_device_num_func) (int);
-
   /* Execute.  */
   void (*exec_func) (void (*) (void *), size_t, void **, void **, size_t *,
 		 unsigned short *, int, int, int, int, void *);
@@ -720,7 +705,7 @@ typedef struct acc_dispatch_t
   void (*async_set_async_func) (int);
 
   /* Create/destroy TLS data.  */
-  void *(*create_thread_data_func) (void *);
+  void *(*create_thread_data_func) (int);
   void (*destroy_thread_data_func) (void *);
 
   /* NVIDIA target specific routines.  */
diff --git a/libgomp/oacc-async.c b/libgomp/oacc-async.c
index 08b7c5e..1f5827e 100644
--- a/libgomp/oacc-async.c
+++ b/libgomp/oacc-async.c
@@ -26,7 +26,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
http://www.gnu.org/licenses/.  */
 
-
+#include assert.h
 #include openacc.h
 #include libgomp.h
 #include oacc-int.h
@@ -37,13 +37,23 @@ acc_async_test (int async)
   if (async  acc_async_sync)
 gomp_fatal (invalid async argument: %d, async);
 
-  return base_dev-openacc.async_test_func (async);
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!thr || !thr-dev)
+gomp_fatal (no device active);
+
+  return thr-dev-openacc.async_test_func (async);
 }
 
 int
 acc_async_test_all (void)
 {
-  return base_dev-openacc.async_test_all_func ();
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!thr || !thr-dev)
+gomp_fatal (no device active);
+
+  return thr-dev-openacc.async_test_all_func ();
 }
 
 void
@@ -52,19 +62,34 @@ acc_wait (int async)
   if (async  acc_async_sync)
 gomp_fatal (invalid async argument: %d, async);
 
-  base_dev-openacc.async_wait_func (async);
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!thr || !thr-dev)
+gomp_fatal (no device 

Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-03-27 Thread Julian Brown
On Thu, 26 Mar 2015 13:09:19 +0100
Jakub Jelinek ja...@redhat.com wrote:

 On Mon, Mar 23, 2015 at 10:44:39PM +0300, Ilya Verbin wrote:
  If it is too late for such global changes (rework initialization in
  libgomp, change mic and ptx plugins), then here is a small
  workaround patch to fix offloading from libraries.  Likely, it will
  not affect OpenACC programs with one image.  make
  check-target-libgomp passed.
 
 Sorry for not getting to this earlier, really busy with severe
 regressions bugfixing lately.
 
 Anyway, IMHO it is not too late to fixing it properly, after all,
 the current code is majorly broken.  As I've said earlier, e.g. the
 lack of mutex guarding gomp_target_init (which is using pthread_once
 guaranteed to be run just once) vs. concurrent GOMP_offload_register
 calls (if those are run from ctors, then I guess something like
 dl_load_lock ensures at least on glibc that multiple
 GOMP_offload_register calls aren't performed at the same time) in
 accessing/reallocating offload_images and num_offload_images and the
 lack of support to register further images after the gomp_target_init
 call (if you dlopen further shared libraries) is really bad.  And it
 would be really nice to support the unloading.
 
 But I'm afraid I'm lost in what is the latest posted patch for that,
 and how has it been tested (whether just on MIC or MIC emul, or also
 for nvptx).
 
 So can you please post a link to the latest full patch and how it has
 been tested, and if it is still error prone if say one thread executes
 GOMP_target the first time and another at the same time dlopens some
 shared library that has offloading regions in it, fix that too?

I couldn't say about that -- I don't have all the state on the locking
problems at the moment.

 We still have a week or so to get this sorted out.

Apologies again for the delay in getting this out. Here's a current
version of the patch against the gomp4 branch (on top of Ilya's
load/unload patch) which passes testing for nvptx/openacc/libgomp,
modulo the usual (timing-related) noise in the lib-83.c test.

Thomas, can we get this tested on mainline and with MIC emulation?

This version fixes (some if not all) regressions with multiple
NVidia devices, and removes the memory-map lock and is_initialised
fields, reverting to just a splay tree in gomp_device_descr (as the
code was before the OpenACC merge). (Our multi-GPU machine is
temporarily out of action, so I can't easily test that setup at the
moment).

HTH,

Julian

ChangeLog

gcc/
* config/nvptx/mkoffload.c (process): Support variable mapping.

libgomp/
* libgomp.h (target_mem_desc: Remove mem_map field.
(struct gomp_memory_mapping): Remove.
(acc_dispatch_t): Remove open_device_func, close_device_func,
get_device_num_func, set_device_num_func, target_data members.
Change create_thread_data_func argument to device number instead of
generic pointer.
(struct gomp_device_descr): Replace mem_map field with splay tree
directly.
(gomp_free_memmap): Update prototype.
* oacc-async.c (assert.h): Include.
(acc_async_test, acc_async_test_all, acc_wait, acc_wait_async)
(acc_wait_all, acc_wait_all_async): Use current host thread's
active device, not base_dev.
* oacc-cuda.c (acc_get_current_cuda_device)
(acc_get_current_cuda_context, acc_get_cuda_stream)
(acc_set_cuda_stream): Likewise.
* oacc-host.c (host_dispatch): Don't set open_device_func,
close_device_func, get_device_num_func or set_device_num_func.
(goacc_host_init): Don't initialise host_dispatch.mem_map.lock.
* oacc-init.c (base_dev, init_key): Remove.
(cached_base_dev): New.
(name_of_acc_device_t): New.
(acc_init_1): Initialise default-numbered device, not zeroth.
(acc_shutdown_1): Close all devices of a given type.
(goacc_destroy_thread): Don't use base_dev.
(lazy_open, lazy_init, lazy_init_and_open): Remove.
(goacc_attach_host_thread_to_device): New.
(acc_init): Reimplement with goacc_attach_host_thread_to_device.
(acc_get_num_devices): Don't use base_dev.
(acc_set_device_type): Reimplement.
(acc_get_device_type): Don't use base_dev.
(acc_get_device_num): Tweak logic.
(acc_set_device_num): Likewise.
(goacc_runtime_initialize): Initialize cached_base_dev not base_dev.
(goacc_lazy_initialize): Reimplement with acc_init and
goacc_attach_host_thread_to_device.
* oacc-int.h (goacc_thread): Add base_dev field.
(base_dev): Remove extern declaration.
(goacc_attach_host_thread_to_device): Add prototype.
* oacc-mem.c (lookup_host): Change first argument to
gomp_device_descr. Use lock/splay tree from gomp_device_descr.
(lookup_dev): Use lock from devicep not mem_map.
(acc_malloc): Use current thread's device instead of base_dev.
(acc_free): Likewise.
(acc_memcpy_to_device): Likewise.
(acc_memcpy_from_device): Likewise.
(acc_deviceptr, acc_map_data, acc_unmap_data, 

Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-03-26 Thread Jakub Jelinek
On Mon, Mar 23, 2015 at 10:44:39PM +0300, Ilya Verbin wrote:
 If it is too late for such global changes (rework initialization in libgomp,
 change mic and ptx plugins), then here is a small workaround patch to fix
 offloading from libraries.  Likely, it will not affect OpenACC programs with 
 one
 image.  make check-target-libgomp passed.

Sorry for not getting to this earlier, really busy with severe regressions
bugfixing lately.

Anyway, IMHO it is not too late to fixing it properly, after all,
the current code is majorly broken.  As I've said earlier, e.g. the lack
of mutex guarding gomp_target_init (which is using pthread_once guaranteed
to be run just once) vs. concurrent GOMP_offload_register calls
(if those are run from ctors, then I guess something like dl_load_lock
ensures at least on glibc that multiple GOMP_offload_register calls aren't
performed at the same time) in accessing/reallocating offload_images
and num_offload_images and the lack of support to register further
images after the gomp_target_init call (if you dlopen further shared
libraries) is really bad.  And it would be really nice to support the
unloading.

But I'm afraid I'm lost in what is the latest posted patch for that,
and how has it been tested (whether just on MIC or MIC emul, or also for
nvptx).

So can you please post a link to the latest full patch and how it has been
tested, and if it is still error prone if say one thread executes
GOMP_target the first time and another at the same time dlopens some shared
library that has offloading regions in it, fix that too?

We still have a week or so to get this sorted out.

Jakub


Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-03-26 Thread Ilya Verbin
On Thu, Mar 26, 2015 at 13:09:19 +0100, Jakub Jelinek wrote:
 On Mon, Mar 23, 2015 at 10:44:39PM +0300, Ilya Verbin wrote:
  If it is too late for such global changes (rework initialization in libgomp,
  change mic and ptx plugins), then here is a small workaround patch to fix
  offloading from libraries.  Likely, it will not affect OpenACC programs 
  with one
  image.  make check-target-libgomp passed.
 
 Sorry for not getting to this earlier, really busy with severe regressions
 bugfixing lately.
 
 Anyway, IMHO it is not too late to fixing it properly, after all,
 the current code is majorly broken.  As I've said earlier, e.g. the lack
 of mutex guarding gomp_target_init (which is using pthread_once guaranteed
 to be run just once) vs. concurrent GOMP_offload_register calls
 (if those are run from ctors, then I guess something like dl_load_lock
 ensures at least on glibc that multiple GOMP_offload_register calls aren't
 performed at the same time) in accessing/reallocating offload_images
 and num_offload_images and the lack of support to register further
 images after the gomp_target_init call (if you dlopen further shared
 libraries) is really bad.  And it would be really nice to support the
 unloading.
 
 But I'm afraid I'm lost in what is the latest posted patch for that,
 and how has it been tested (whether just on MIC or MIC emul, or also for
 nvptx).
 
 So can you please post a link to the latest full patch and how it has been
 tested, and if it is still error prone if say one thread executes
 GOMP_target the first time and another at the same time dlopens some shared
 library that has offloading regions in it, fix that too?
 
 We still have a week or so to get this sorted out.

Here is the latest patch for libgomp and mic plugin.
make check-target-libgomp using intelmic emul passed.
Also I used a testcase from the attachment.

Latest ptx part is here, I guess:
https://gcc.gnu.org/ml/gcc-patches/2015-02/msg01407.html

Thomas, could you please test these 2 patches together on nvptx?


gcc/
* config/i386/intelmic-mkoffload.c (generate_host_descr_file): Call
GOMP_offload_unregister from the destructor.
libgomp/
* libgomp-plugin.h (struct mapping_table): Replace with addr_pair.
* libgomp.h (struct gomp_memory_mapping): Remove.
(struct target_mem_desc): Change type of mem_map from
gomp_memory_mapping * to splay_tree_s *.
(struct gomp_device_descr): Remove register_image_func, get_table_func.
Add load_image_func, unload_image_func.
Change type of mem_map from gomp_memory_mapping to splay_tree_s.
Remove offload_regions_registered.
(gomp_init_tables): Remove.
(gomp_free_memmap): Change type of argument from gomp_memory_mapping *
to splay_tree_s *.
* libgomp.map (GOMP_4.0.1): Add GOMP_offload_unregister.
* oacc-host.c (host_dispatch): Do not initialize register_image_func,
get_table_func, mem_map.is_initialized, mem_map.splay_tree.root,
offload_regions_registered.
Initialize load_image_func, unload_image_func, mem_map.root.
(goacc_host_init): Do not initialize host_dispatch.mem_map.lock.
* oacc-init.c (lazy_open): Don't call gomp_init_tables.
(acc_shutdown_1): Use dev's lock and splay_tree instead of mem_map's.
* oacc-mem.c (lookup_host): Get gomp_device_descr *dev instead of
gomp_memory_mapping *.  Use dev's lock and splay_tree.
(lookup_dev): Use dev's lock.
(acc_deviceptr): Pass dev to lookup_host instead of mem_map.
(acc_is_present): Likewise.
(acc_map_data): Likewise.
(acc_unmap_data): Likewise.  Use dev's lock.
(present_create_copy): Likewise.
(delete_copyout): Pass dev to lookup_host instead of mem_map.
(update_dev_host): Likewise.
(gomp_acc_remove_pointer): Likewise.  Use dev's lock.
* oacc-parallel.c (GOACC_parallel): Use dev's lock and splay_tree.
* plugin/plugin-host.c (GOMP_OFFLOAD_register_image): Remove.
(GOMP_OFFLOAD_get_table): Remove
(GOMP_OFFLOAD_load_image): New function.
(GOMP_OFFLOAD_unload_image): New function.
* target.c (register_lock): New mutex for offload image registration.
(gomp_map_vars): Use dev's lock and splay_tree instead of mem_map's.
(gomp_copy_from_async): Likewise.
(gomp_unmap_vars): Likewise.
(gomp_update): Remove gomp_memory_mapping argument.  Use dev's lock and
splay_tree instead of mm's.
(gomp_splay_tree_insert_mapping): New static function.
(gomp_offload_image_to_device): Ditto.
(GOMP_offload_register): Add mutex lock.
Call gomp_offload_image_to_device for all initialized devices.
(GOMP_offload_unregister): New function.
(gomp_init_tables): Replace with gomp_init_device.  Replace a call to
get_table_func from the plugin with calls to init_device_func and

Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-03-23 Thread Ilya Verbin
On Mon, Mar 09, 2015 at 14:45:55 +, Julian Brown wrote:
 On Fri, 6 Mar 2015 17:01:13 +0300
 Ilya Verbin iver...@gmail.com wrote:
 
  On Thu, Feb 26, 2015 at 20:25:11 +0300, Ilya Verbin wrote:
   On Wed, Feb 25, 2015 at 10:36:08 +0100, Thomas Schwinge wrote:
 Julian Brown jul...@codesourcery.com wrote:
 This is a version of the previously-posted patch to rework
 initialisation and support the proposed load/unload hooks,
 merged to gomp4 branch and tested alongside the two patches
 (from
   
   Currently the 'struct gomp_memory_mapping' contains 'lock' and
   'is_initialized'. Do you still need them?  Or we can use
   gomp_device_descr::lock and is_initialized instead?  If yes, then
   we can replace the gomp_memory_mapping structure with a splay_tree,
   as it was before the OpenACC merge.
  
  Ping?
 
 Apologies, I've been distracted with travel and other things. I
 suspect, as you suggest, that the gomp_memory_mapping
 lock/is_initialized fields may no longer be required. I haven't yet had
 time to address that nor all of Thomas's comments on the patch (mostly
 breakage with multiple devices), and I'm unlikely to have time this
 week either due to vacation...

If it is too late for such global changes (rework initialization in libgomp,
change mic and ptx plugins), then here is a small workaround patch to fix
offloading from libraries.  Likely, it will not affect OpenACC programs with one
image.  make check-target-libgomp passed.


PR libgomp/65338
libgomp/
* libgomp.h (struct gomp_device_descr): Remove
offload_regions_registered.
* oacc-host.c (host_dispatch): Do not initialize
offload_regions_registered.
* target.c (gomp_register_image_for_device): Do not check for
offload_regions_registered.
(gomp_target_init): Do not initialize offload_regions_registered.


diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 3089401..f45fdba 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -793,9 +793,6 @@ struct gomp_device_descr
   /* Set to true when device is initialized.  */
   bool is_initialized;
 
-  /* True when offload regions have been registered with this device.  */
-  bool offload_regions_registered;
-
   /* OpenACC-specific data and functions.  */
   /* This is mutable because of its mutable data_environ and target_data
  members.  */
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index 6aeb1e7..2763f44 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -56,7 +56,6 @@ static struct gomp_device_descr host_dispatch =
 .mem_map.is_initialized = false,
 .mem_map.splay_tree.root = NULL,
 .is_initialized = false,
-.offload_regions_registered = false,
 
 .openacc = {
   .open_device_func = GOMP_OFFLOAD_openacc_open_device,
diff --git a/libgomp/target.c b/libgomp/target.c
index 50baa4d..db1f509 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1035,13 +1035,8 @@ static void
 gomp_register_image_for_device (struct gomp_device_descr *device,
struct offload_image_descr *image)
 {
-  if (!device-offload_regions_registered
-   (device-type == image-type
- || device-type == OFFLOAD_TARGET_TYPE_HOST))
-{
-  device-register_image_func (image-host_table, image-target_data);
-  device-offload_regions_registered = true;
-}
+  if (device-type == image-type || device-type == OFFLOAD_TARGET_TYPE_HOST)
+device-register_image_func (image-host_table, image-target_data);
 }
 
 /* This function initializes the runtime needed for offloading.
@@ -1105,7 +1100,6 @@ gomp_target_init (void)
current_device.mem_map.is_initialized = false;
current_device.mem_map.splay_tree.root = NULL;
current_device.is_initialized = false;
-   current_device.offload_regions_registered = false;
current_device.openacc.data_environ = NULL;
current_device.openacc.target_data = NULL;
for (i = 0; i  new_num_devices; i++)


  -- Ilya


Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-03-09 Thread Julian Brown
On Fri, 6 Mar 2015 17:01:13 +0300
Ilya Verbin iver...@gmail.com wrote:

 On Thu, Feb 26, 2015 at 20:25:11 +0300, Ilya Verbin wrote:
  On Wed, Feb 25, 2015 at 10:36:08 +0100, Thomas Schwinge wrote:
Julian Brown jul...@codesourcery.com wrote:
This is a version of the previously-posted patch to rework
initialisation and support the proposed load/unload hooks,
merged to gomp4 branch and tested alongside the two patches
(from
  
  Currently the 'struct gomp_memory_mapping' contains 'lock' and
  'is_initialized'. Do you still need them?  Or we can use
  gomp_device_descr::lock and is_initialized instead?  If yes, then
  we can replace the gomp_memory_mapping structure with a splay_tree,
  as it was before the OpenACC merge.
 
 Ping?

Apologies, I've been distracted with travel and other things. I
suspect, as you suggest, that the gomp_memory_mapping
lock/is_initialized fields may no longer be required. I haven't yet had
time to address that nor all of Thomas's comments on the patch (mostly
breakage with multiple devices), and I'm unlikely to have time this
week either due to vacation...

Thanks,

Julian


Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-03-06 Thread Ilya Verbin
On Thu, Feb 26, 2015 at 20:25:11 +0300, Ilya Verbin wrote:
 On Wed, Feb 25, 2015 at 10:36:08 +0100, Thomas Schwinge wrote:
   Julian Brown jul...@codesourcery.com wrote:
   This is a version of the previously-posted patch to rework
   initialisation and support the proposed load/unload hooks, merged to
   gomp4 branch and tested alongside the two patches (from
 
 Currently the 'struct gomp_memory_mapping' contains 'lock' and 
 'is_initialized'.
 Do you still need them?  Or we can use gomp_device_descr::lock and
 is_initialized instead?  If yes, then we can replace the gomp_memory_mapping
 structure with a splay_tree, as it was before the OpenACC merge.

Ping?

Thanks,
  -- Ilya


Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-02-26 Thread Ilya Verbin
Hi,

On Wed, Feb 25, 2015 at 10:36:08 +0100, Thomas Schwinge wrote:
  Julian Brown jul...@codesourcery.com wrote:
  This is a version of the previously-posted patch to rework
  initialisation and support the proposed load/unload hooks, merged to
  gomp4 branch and tested alongside the two patches (from

Currently the 'struct gomp_memory_mapping' contains 'lock' and 'is_initialized'.
Do you still need them?  Or we can use gomp_device_descr::lock and
is_initialized instead?  If yes, then we can replace the gomp_memory_mapping
structure with a splay_tree, as it was before the OpenACC merge.

Thanks,
  -- Ilya


Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-02-25 Thread Julian Brown
On Wed, 25 Feb 2015 10:36:08 +0100
Thomas Schwinge tho...@codesourcery.com wrote:

 Hi!
 
 On Tue, 24 Feb 2015 11:29:51 +, Julian Brown
 jul...@codesourcery.com wrote:
  Test results look OK, barring a suspected harness issue (lib-83
  failing with a timeout for nvptx
 
 However, I'm seeing a class of testsuite regressions: all variants of
 libgomp.oacc-fortran/lib-5.f90 and libgomp.oacc-fortran/lib-7.f90
 FAIL: »libgomp: cuMemFreeHost error: invalid value«.  I see these two
 test cases contain a lot of acc_get_num_devices and similar calls --
 I've been testing this on our nvidiak20-2 system, which contains two
 Nvidia K20 cards, so maybe there's something wrong in that regard.
 (But why is this failing only for Fortran -- are we missing C/C++
 tests in that area?) Can you have a look, or want me to?

I can have a look at that.

  --- a/gcc/config/nvptx/mkoffload.c
  +++ b/gcc/config/nvptx/mkoffload.c
  @@ -850,16 +851,17 @@ process (FILE *in, FILE *out)
 
 fprintf (out, static const void *target_data[] = {\n);
  -  fprintf (out,   ptx_code, var_mappings, func_mappings\n);
  +  fprintf (out,   ptx_code, (void*) %u, var_mappings, (void*) %u,
  
  +   func_mappings\n, nvars, nfuncs);
 fprintf (out, };\n\n);
 
 I wondered if it's maybe more elegant to just separate those by NULL
 delimiters instead of the size integers casted to void * (spaces
 missing)?  But then, that'd need double scanning in the consumer,
 libgomp/plugin/plugin-nvptx.c:GOMP_OFFLOAD_load_image, because we
 need to allocate an appropriately sized array, so maybe your more
 expressive approach is better indeed.

Yeah, I considered both: there's probably not much to choose between
the approaches. They use the same amount of space.

  --- a/libgomp/oacc-async.c
  +++ b/libgomp/oacc-async.c
  @@ -34,44 +34,68 @@
   int
   acc_async_test (int async)
   {
  +  struct goacc_thread *thr = goacc_thread ();
  +
 if (async  acc_async_sync)
   gomp_fatal (invalid async argument: %d, async);
   
  -  return base_dev-openacc.async_test_func (async);
  +  assert (thr-dev);
  +
  +  return thr-dev-openacc.async_test_func (async);
   }

 Here, and in several other places: is this code conforming to the
 OpenACC specification?  Do we need to (lazily) initialize in all
 these places, or in goacc_thread, or gracefully fail (see below) if
 not initialized (basically in all places where you currently assert
 (thr-dev)?
 
 #include openacc.h
 
 int main(int argc, char *argv[])
 {
   return acc_async_test(0);
 }
 
 [sigsegv]

Whether it conforms to the spec or not is a hard question to answer,
because a lot of behaviour is left undefined. But here are two
possibly-useful made-up guidelines:

1. Does the program work the same with OpenACC disabled?

2. Does some strange use of OpenACC functionality (including library
   calls, etc.) probably indicate user error?

Much of the lazy initialisation code is there so that (1) can be true
-- i.e., a program can use OpenACC directives without making an
explicit call to acc_init or other API-specific initialisation code.

But this case is an explicit call to the OpenACC runtime library, so the
program can't work without -fopenacc enabled, so we can follow
guideline (2) instead. And in this case, it's meaningless to test for
completion of async operation when no device is active.

Of course though, this should be an actual error rather than a crash.
But, I don't think we want to lazily-initialise here.

 Also, I'm not sure what the expected outcome of this code sequence is:
 
 acc_init(acc_device_nvidia);
 acc_shutdown(acc_device_nvidia);
 acc_async_test(0);
 
 a.out: [...]/source-gcc/libgomp/oacc-async.c:42: acc_async_test:
 Assertion `thr-dev' failed. Aborted (core dumped)
 
 If the OpenACC specification can be read such that all this indeed is
 undefined behavior, then aborting/crashing is OK, of course.

Again, this would probably indicate user error in a real program, so it
should raise a (real) error message.

  --- a/libgomp/oacc-cuda.c
  +++ b/libgomp/oacc-cuda.c
  @@ -34,51 +34,53 @@
   void *
   acc_get_current_cuda_device (void)
   {
  -  void *p = NULL;
  +  struct goacc_thread *thr = goacc_thread ();
   
  -  if (base_dev  base_dev-openacc.cuda.get_current_device_func)
  -p = base_dev-openacc.cuda.get_current_device_func ();
  +  if (thr  thr-dev 
  thr-dev-openacc.cuda.get_current_device_func)
  +return thr-dev-openacc.cuda.get_current_device_func ();
   
  -  return p;
  +  return NULL;
   }
 
 Here, and in other places, it looks as if we'd fail gracefully.

Not sure about this (maybe it should be an error too?), but...

   int
   acc_set_cuda_stream (int async, void *stream)
   {
  -  int s = -1;
  +  struct goacc_thread *thr;
   
 if (async  0 || stream == NULL)
   return 0;
   
 goacc_lazy_initialize ();
   
  -  if (base_dev  base_dev-openacc.cuda.set_stream_func)
  -s = base_dev-openacc.cuda.set_stream_func 

Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-02-25 Thread Ilya Verbin
On Wed, Feb 25, 2015 at 10:36:08 +0100, Thomas Schwinge wrote:
  Julian Brown jul...@codesourcery.com wrote:
  OK for gomp4 branch? I could commit Ilya's patch there too if so.
 
 I'll leave the decision to Jakub, but, what about trunk?  As Ilya
 indicated in
 http://news.gmane.org/find-root.php?message_id=%3C20150116231632.GB48380%40msticlxl57.ims.intel.com%3E,
 (at least part of) these patches are fixing a regression with offloading
 From shared libraries.  (And maybe the rest qualifies as fixes and
 extensions to new code (offloading), so no danger to cause any
 regressions compared to the last GCC release?)

BTW, when I removed calls to gomp_init_tables in  
https://gcc.gnu.org/ml/gcc-patches/2015-01/msg02275.html,
I could accidentally remove some necessary gomp_mutex_lock/unlock.
Also GOMP_offload_[un]register require some mutexes, as noted by Jakub.
I'm going to fix this.  So, I think we should commit all dependent patches to
gomp4 branch, and I will post a fix for mutexes on top of them.

Thanks,
  -- Ilya


libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-02-25 Thread Thomas Schwinge
Hi!

On Tue, 24 Feb 2015 11:29:51 +, Julian Brown jul...@codesourcery.com 
wrote:
 On Wed, 4 Feb 2015 15:05:45 +
 Julian Brown jul...@codesourcery.com wrote:
 
  The major changes are: [...]

Thanks for looking into this!

 This is a version of the previously-posted patch to rework
 initialisation and support the proposed load/unload hooks, merged to
 gomp4 branch and tested alongside the two patches (from
 https://gcc.gnu.org/wiki/Offloading#nvptx_Offloading):
 
 http://news.gmane.org/find-root.php?message_id=%3C20150218100035.GF1746%40tucnak.redhat.com%3E
 
 http://news.gmane.org/find-root.php?message_id=%3C546CF508.9010807%40codesourcery.com%3E
 
 As well as Ilya Verbin's patch:
 
 https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01605.html

(I also added
http://news.gmane.org/find-root.php?message_id=%3C20141115000346.GF40445%40msticlxl57.ims.intel.com%3E
to the mix.)

 Test results look OK, barring a suspected harness issue (lib-83
 failing with a timeout for nvptx

Yes; Jim's rewriting the timing code.

However, I'm seeing a class of testsuite regressions: all variants of
libgomp.oacc-fortran/lib-5.f90 and libgomp.oacc-fortran/lib-7.f90 FAIL:
»libgomp: cuMemFreeHost error: invalid value«.  I see these two test
cases contain a lot of acc_get_num_devices and similar calls -- I've been
testing this on our nvidiak20-2 system, which contains two Nvidia K20
cards, so maybe there's something wrong in that regard.  (But why is this
failing only for Fortran -- are we missing C/C++ tests in that area?)
Can you have a look, or want me to?

 OK for gomp4 branch? I could commit Ilya's patch there too if so.

I'll leave the decision to Jakub, but, what about trunk?  As Ilya
indicated in
http://news.gmane.org/find-root.php?message_id=%3C20150116231632.GB48380%40msticlxl57.ims.intel.com%3E,
(at least part of) these patches are fixing a regression with offloading
From shared libraries.  (And maybe the rest qualifies as fixes and
extensions to new code (offloading), so no danger to cause any
regressions compared to the last GCC release?)


I have not reviewed all your changes; just a few comments:

 --- a/gcc/config/nvptx/mkoffload.c
 +++ b/gcc/config/nvptx/mkoffload.c
 @@ -850,16 +851,17 @@ process (FILE *in, FILE *out)

fprintf (out, static const void *target_data[] = {\n);
 -  fprintf (out,   ptx_code, var_mappings, func_mappings\n);
 +  fprintf (out,   ptx_code, (void*) %u, var_mappings, (void*) %u, 
 + func_mappings\n, nvars, nfuncs);
fprintf (out, };\n\n);

I wondered if it's maybe more elegant to just separate those by NULL
delimiters instead of the size integers casted to void * (spaces
missing)?  But then, that'd need double scanning in the consumer,
libgomp/plugin/plugin-nvptx.c:GOMP_OFFLOAD_load_image, because we need to
allocate an appropriately sized array, so maybe your more expressive
approach is better indeed.

 --- a/libgomp/oacc-async.c
 +++ b/libgomp/oacc-async.c
 @@ -34,44 +34,68 @@
  int
  acc_async_test (int async)
  {
 +  struct goacc_thread *thr = goacc_thread ();
 +
if (async  acc_async_sync)
  gomp_fatal (invalid async argument: %d, async);
  
 -  return base_dev-openacc.async_test_func (async);
 +  assert (thr-dev);
 +
 +  return thr-dev-openacc.async_test_func (async);
  }

(Here, and in several other places: I would have placed the declaration
of thr and its initialization just before its first use, but then, no
need to change that now.)

Here, and in several other places: is this code conforming to the OpenACC
specification?  Do we need to (lazily) initialize in all these places, or
in goacc_thread, or gracefully fail (see below) if not initialized
(basically in all places where you currently assert (thr-dev)?

#include openacc.h

int main(int argc, char *argv[])
{
  return acc_async_test(0);
}

$ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/ 
-Bbuild-gcc/x86_64-unknown-linux-gnu/./libgomp/ 
-Bbuild-gcc/x86_64-unknown-linux-gnu/./libgomp/.libs 
-Ibuild-gcc/x86_64-unknown-linux-gnu/./libgomp -Isource-gcc/libgomp 
-Binstall/offload-nvptx-none/libexec/gcc/x86_64-unknown-linux-gnu/5.0.0 
-Binstall/offload-nvptx-none/bin 
-Binstall/offload-x86_64-intelmicemul-linux-gnu/libexec/gcc/x86_64-unknown-linux-gnu/5.0.0
 -Binstall/offload-x86_64-intelmicemul-linux-gnu/bin 
-Lbuild-gcc/x86_64-unknown-linux-gnu/./libgomp/.libs 
-Wl,-rpath,build-gcc/x86_64-unknown-linux-gnu/./libgomp/.libs -Wall ../a.c 
-fopenacc -g
$ gdb -q a.out 
Reading symbols from a.out...done.
(gdb) r
Starting program: [...]/a.out 
[Thread debugging using libthread_db enabled]
Using host libthread_db library /lib/x86_64-linux-gnu/libthread_db.so.1.

Program received signal SIGSEGV, Segmentation fault.
acc_async_test (async=0) at [...]/source-gcc/libgomp/oacc-async.c:42
42assert (thr-dev);

Also, I'm not sure what the expected outcome of this code sequence is:

acc_init(acc_device_nvidia);
acc_shutdown(acc_device_nvidia);

Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-02-24 Thread Julian Brown
Hi,

On Wed, 4 Feb 2015 15:05:45 +
Julian Brown jul...@codesourcery.com wrote:

 The major changes are:
 
 * The removal of the OpenACC-specific plugin hooks open_device,
   close_device, set_device_num and get_device_num. The functionality
   has been moved into the init/fini hooks (for the first two) or moved
   into the target-independent OpenACC parts, respectively.
 
 * The PTX mkoffload utility has been extended to support variables as
   well as function mapping, to fill out support for the load/unload
   image hooks. (Not really tested so far!)
 
 * The plugin hooks that are shared between OpenMP and OpenACC now
   support the device number argument properly: that should help with
   (eventually) unifying the plugin interface for the two APIs. (With
   set_device_num and get_device_num removed, the plugin is stateless
   with respect to which device is currently active. The rest of the
   OpenACC hooks -- async functions, etc. -- should probably be changed
   to take a device number argument too, but that could be a follow-on
   patch.)
 
 * The limitation of having only one type of device active
 simultaneously in the OpenACC runtime has (theoretically!) been
 removed.

This is a version of the previously-posted patch to rework
initialisation and support the proposed load/unload hooks, merged to
gomp4 branch and tested alongside the two patches (from
https://gcc.gnu.org/wiki/Offloading#nvptx_Offloading):

http://news.gmane.org/find-root.php?message_id=%3C20150218100035.GF1746%40tucnak.redhat.com%3E

http://news.gmane.org/find-root.php?message_id=%3C546CF508.9010807%40codesourcery.com%3E

As well as Ilya Verbin's patch:

https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01605.html

Test results look OK, barring a suspected harness issue (lib-83
failing with a timeout for nvptx, though it works fine from the command
line).

OK for gomp4 branch? I could commit Ilya's patch there too if so.

Thanks,

Julian

ChangeLog

gcc/
* config/nvptx/mkoffload.c (process): Support variable mapping.

libgomp/
* libgomp.h (acc_dispatch_t): Remove open_device_func,
close_device_func, get_device_num_func, set_device_num_func,
target_data members. Change create_thread_data_func argument to
device number instead of generic pointer.
* oacc-async.c (assert.h): Include.
(acc_async_test, acc_async_test_all, acc_wait, acc_wait_async)
(acc_wait_all, acc_wait_all_async): Use current host thread's
active device, not base_dev.
* oacc-cuda.c (acc_get_current_cuda_device)
(acc_get_current_cuda_context, acc_get_cuda_stream)
(acc_set_cuda_stream): Likewise.
* oacc-host.c (host_dispatch): Don't set open_device_func,
close_device_func, get_device_num_func or set_device_num_func.
* oacc-init.c (base_dev, init_key): Remove.
(cached_base_dev): New.
(name_of_acc_device_t): New.
(acc_init_1): Initialise default-numbered device, not zeroth.
(acc_shutdown_1): Close all devices of a given type.
(goacc_destroy_thread): Don't use base_dev.
(lazy_open, lazy_init, lazy_init_and_open): Remove.
(goacc_attach_host_thread_to_device): New.
(acc_init): Reimplement with goacc_attach_host_thread_to_device.
(acc_get_num_devices): Don't use base_dev.
(acc_set_device_type): Reimplement.
(acc_get_device_type): Don't use base_dev.
(acc_get_device_num): Tweak logic.
(acc_set_device_num): Likewise.
(goacc_runtime_initialize): Initialize cached_base_dev not base_dev.
(goacc_lazy_initialize): Reimplement with acc_init and
goacc_attach_host_thread_to_device.
* oacc-int.h (goacc_thread): Add base_dev field.
(base_dev): Remove extern declaration.
(goacc_attach_host_thread_to_device): Add prototype.
* oacc-mem.c (acc_malloc): Use current thread's device instead of
base_dev.
(acc_free): Likewise.
(acc_memcpy_to_device): Likewise.
(acc_memcpy_from_device): Likewise.
* oacc-parallel.c (select_acc_device): Remove. Replace calls with
goacc_lazy_initialize (throughout).
* target.c (gomp_load_plugin_for_device): Don't initialise openacc
open_device, close_device, get_device_num or set_device_num hooks.
Don't initialise target_data.
* plugin/plugin-host.c (GOMP_OFFLOAD_openacc_open_device)
(GOMP_OFFLOAD_openacc_close_device)
(GOMP_OFFLOAD_openacc_get_device_num)
(GOMP_OFFLOAD_openacc_set_device_num): Remove.
(GOMP_OFFLOAD_openacc_create_thread_data): Change (unused) argument
to int.
* plugin/plugin-nvptx.c (pthread.h): Include.
(ptx_inited): Remove.
(instantiated_devices, ptx_dev_lock): New.
(struct ptx_image_data): New.
(ptx_devices, ptx_images, ptx_image_lock): New.
(nvptx_get_num_devices): Remove forward declaration.
(nvptx_init): Change return type to bool.
(nvptx_fini): Remove.
(nvptx_attach_host_thread_to_device): New.
(nvptx_open_device): Remove struct ptx_device* instead of void*.
(nvptx_close_device): Change argument 

Fix number of arguments parameter in Ada DEF_FUNCTION_TYPE_* (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-02-23 Thread Thomas Schwinge
Hi!

On Thu, 15 Jan 2015 21:20:07 +0100, I wrote:
 In r219682, I have committed to trunk our current set of OpenACC changes,
 which we had prepared on gomp-4_0-branch.  [...]

   gcc/ada/
   * gcc-interface/utils.c (DEF_FUNCTION_TYPE_VAR_8)
   (DEF_FUNCTION_TYPE_VAR_12): New macros.

Committed in r220910:

commit 11d2c7638e26ab69df7167474c9aa8f5d4114703
Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4
Date:   Mon Feb 23 10:06:49 2015 +

Fix number of arguments parameter in Ada DEF_FUNCTION_TYPE_*.

gcc/ada/
* gcc-interface/utils.c (DEF_FUNCTION_TYPE_VAR_8)
(DEF_FUNCTION_TYPE_VAR_12): Fix number of arguments parameter.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@220910 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ada/ChangeLog |5 +
 gcc/ada/gcc-interface/utils.c |4 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git gcc/ada/ChangeLog gcc/ada/ChangeLog
index ddd6e10..06a51ac 100644
--- gcc/ada/ChangeLog
+++ gcc/ada/ChangeLog
@@ -1,3 +1,8 @@
+2015-02-23  Thomas Schwinge  tho...@codesourcery.com
+
+   * gcc-interface/utils.c (DEF_FUNCTION_TYPE_VAR_8)
+   (DEF_FUNCTION_TYPE_VAR_12): Fix number of arguments parameter.
+
 2015-02-22  Arnaud Charlet  char...@adacore.com
 
* doc/Makefile: postprocess texinfo files to update @dircategory
diff --git gcc/ada/gcc-interface/utils.c gcc/ada/gcc-interface/utils.c
index 44dad7b..d076da7 100644
--- gcc/ada/gcc-interface/utils.c
+++ gcc/ada/gcc-interface/utils.c
@@ -5477,11 +5477,11 @@ install_builtin_function_types (void)
   def_fn_type (ENUM, RETURN, 1, 5, ARG1, ARG2, ARG3, ARG4, ARG5);
 #define DEF_FUNCTION_TYPE_VAR_8(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
ARG6, ARG7, ARG8)   \
-  def_fn_type (ENUM, RETURN, 1, 5, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6, \
+  def_fn_type (ENUM, RETURN, 1, 8, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6, \
   ARG7, ARG8);
 #define DEF_FUNCTION_TYPE_VAR_12(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11, ARG12) \
-  def_fn_type (ENUM, RETURN, 1, 5, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6, \
+  def_fn_type (ENUM, RETURN, 1, 12, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6,
\
   ARG7, ARG8, ARG9, ARG10, ARG11, ARG12);
 #define DEF_POINTER_TYPE(ENUM, TYPE) \
   builtin_types[(int) ENUM] = build_pointer_type (builtin_types[(int) TYPE]);


Grüße,
 Thomas


signature.asc
Description: PGP signature


Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-02-18 Thread Ilya Verbin
On Wed, Feb 04, 2015 at 15:05:45 +, Julian Brown wrote:
 This (WIP) patch is based on top of a version of your patch that I
 merged to our internal branch: that's still the easiest way for me to
 test the PTX backend (with unloading support) at present, and it passes
 libgomp testing that way. Trunk should be fairly close, but I haven't
 tried applying it there yet.
 
 The major changes are:
 
 * The removal of the OpenACC-specific plugin hooks open_device,
   close_device, set_device_num and get_device_num. The functionality
   has been moved into the init/fini hooks (for the first two) or moved
   into the target-independent OpenACC parts, respectively.
 
 * The PTX mkoffload utility has been extended to support variables as
   well as function mapping, to fill out support for the load/unload
   image hooks. (Not really tested so far!)
 
 * The plugin hooks that are shared between OpenMP and OpenACC now
   support the device number argument properly: that should help with
   (eventually) unifying the plugin interface for the two APIs. (With
   set_device_num and get_device_num removed, the plugin is stateless
   with respect to which device is currently active. The rest of the
   OpenACC hooks -- async functions, etc. -- should probably be changed
   to take a device number argument too, but that could be a follow-on
   patch.)
 
 * The limitation of having only one type of device active simultaneously
   in the OpenACC runtime has (theoretically!) been removed.
 
 Thoughts?

Up.

I have no comments here since I'm not familiar with OpenACC and PTX, but I hope
that Thomas and Jakub will review this and my corresponding patches [1], [2]
before the final closure of the trunk.

[1] https://gcc.gnu.org/ml/gcc-patches/2015-01/msg02275.html
[2] https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01912.html

  -- Ilya


Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-02-17 Thread Thomas Schwinge
Hi!

On Thu, 15 Jan 2015 21:20:07 +0100, I wrote:
 In r219682, I have committed to trunk our current set of OpenACC changes,
 which we had prepared on gomp-4_0-branch.

This whole file is scheduled to go away: the routines are to be replaced
by builtins which are expanded in the nvptx backend, but until we're
there, here's a patch to make it at least work; committed to trunk in
r220768:

commit f816d7a63c8bc11c81080a0b34bf587d46b6f4c6
Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4
Date:   Tue Feb 17 18:05:24 2015 +

libgomp: Make nvptx helper routines self-contained.

libgomp/
* oacc-ptx.h (GOACC_INTERNAL_PTX): Add GOACC_tid, GOACC_ntid,
GOACC_ctaid, and GOACC_nctaid routines.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@220768 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog  |6 ++
 libgomp/oacc-ptx.h |  224 
 2 files changed, 230 insertions(+)

diff --git libgomp/ChangeLog libgomp/ChangeLog
index 6c24531..2c32d9e 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,3 +1,9 @@
+2015-02-17  Thomas Schwinge  tho...@codesourcery.com
+   Cesar Philippidis  ce...@codesourcery.com
+
+   * oacc-ptx.h (GOACC_INTERNAL_PTX): Add GOACC_tid, GOACC_ntid,
+   GOACC_ctaid, and GOACC_nctaid routines.
+
 2015-02-11  Jakub Jelinek  ja...@redhat.com
 
PR c/64824
diff --git libgomp/oacc-ptx.h libgomp/oacc-ptx.h
index 13ff86f..2419a46 100644
--- libgomp/oacc-ptx.h
+++ libgomp/oacc-ptx.h
@@ -101,9 +101,233 @@
   .version 3.1\n \
   .target sm_30\n \
   .address_size 64\n \
+  .visible .func (.param .u32 %out_retval) GOACC_tid (.param .u32 
%in_ar1);\n \
+  .visible .func (.param .u32 %out_retval) GOACC_ntid (.param .u32 
%in_ar1);\n \
+  .visible .func (.param .u32 %out_retval) GOACC_ctaid (.param .u32 
%in_ar1);\n \
+  .visible .func (.param .u32 %out_retval) GOACC_nctaid (.param .u32 
%in_ar1);\n \
   .visible .func (.param .u32 %out_retval) GOACC_get_num_threads;\n \
   .visible .func (.param .u32 %out_retval) GOACC_get_thread_num;\n \
   .extern .func abort;\n \
+  .visible .func (.param .u32 %out_retval) GOACC_tid (.param .u32 %in_ar1)\n 
\
+  {\n \
+  .reg .u32 %ar1;\n \
+  .reg .u32 %retval;\n \
+  .reg .u64 %hr10;\n \
+  .reg .u32 %r22;\n \
+  .reg .u32 %r23;\n \
+  .reg .u32 %r24;\n \
+  .reg .u32 %r25;\n \
+  .reg .u32 %r26;\n \
+  .reg .u32 %r27;\n \
+  .reg .u32 %r28;\n \
+  .reg .u32 %r29;\n \
+  .reg .pred %r30;\n \
+  .reg .u32 %r31;\n \
+  .reg .pred %r32;\n \
+  .reg .u32 %r33;\n \
+  .reg .pred %r34;\n \
+  .local .align 8 .b8 %frame[4];\n \
+  ld.param.u32 %ar1,[%in_ar1];\n \
+  mov.u32 %r27,%ar1;\n \
+  st.local.u32 [%frame],%r27;\n \
+  ld.local.u32 %r28,[%frame];\n \
+  mov.u32 %r29,1;\n  \
+  setp.eq.u32 %r30,%r28,%r29;\n  \
+  @%r30 bra $L4;\n   \
+  mov.u32 %r31,2;\n  \
+  setp.eq.u32 %r32,%r28,%r31;\n  \
+  @%r32 bra $L5;\n   \
+  mov.u32 %r33,0;\n  \
+  setp.eq.u32 %r34,%r28,%r33;\n  \
+  @!%r34 bra $L8;\n  \
+  mov.u32 %r23,%tid.x;\n \
+  mov.u32 %r22,%r23;\n   \
+  bra $L7;\n \
+  $L4:\n \
+  mov.u32 %r24,%tid.y;\n \
+  mov.u32 %r22,%r24;\n   \
+  bra $L7;\n \
+  $L5:\n \
+  mov.u32 %r25,%tid.z;\n \
+  mov.u32 %r22,%r25;\n   \
+  bra $L7;\n \
+  $L8:\n \
+  {\n
\
+  {\n
\
+  call abort;\n  \
+  }\n
\
+  }\n
\
+  $L7:\n \
+  mov.u32 %r26,%r22;\n   \
+  mov.u32 %retval,%r26;\n\
+  st.param.u32 [%out_retval],%retval;\n  \
+  ret;\n  

Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-02-04 Thread Julian Brown
On Tue, 3 Feb 2015 23:01:04 +0300
Ilya Verbin iver...@gmail.com wrote:

 On 03 Feb 13:00, Julian Brown wrote:
  On Tue, 3 Feb 2015 14:28:44 +0300
  Ilya Verbin iver...@gmail.com wrote:
   On 27 Jan 14:07, Julian Brown wrote:
On Mon, 26 Jan 2015 17:34:26 +0300
Ilya Verbin iver...@gmail.com wrote:
 Here is my current patch, it works for OpenMP-MIC, but
 obviously will not work for PTX, since it requires
 symmetrical changes in the plugin.  Could you please take a
 look, whether it is possible to support this new interface in
 PTX plugin?

I think it can probably be made to work. I'll have a look in
more detail.
   
   Do you have any progress on this?
  
  I'm still working on a patch to update OpenACC support and the PTX
  backend to use load/unload_image and to unify
  initialisation/opening. So far I think the answer is basically
  yes, the new interface can be supported, though I might request a
  minor tweak -- e.g. that load_image takes an extra void **
  argument so that a libgomp backend can allocate a block of generic
  metadata relating to the image, then that same block would be
  passed (void *) to the unload hook so the backend can use it there
  and deallocate it when it's finished with.
  
  Would that be possible? (It'd mostly be for a CUmodule handle:
  this could be stashed away somewhere within the nvptx backend, but
  it might be neater to put it in generic code since it'll probably
  be useful for other backends anyway.)
 
 An extra argument is not a problem, however I don't quite get the
 idea. PTX plugin allocates some data while loading, and needs this
 data while unloading?  Then why not to create a hash table with
 image_ptr - metadata mapping inside the plugin? [...]

Right -- that's what I meant by could be stashed away somewhere within
the nvptx backend. I just thought that retaining a generic chunk of
state for each (JIT-compiled, in this case) block of code might be
something that would be useful for other targets too. I've kept
the required information (for now at least) within the nvptx backend as
an associative list.

This (WIP) patch is based on top of a version of your patch that I
merged to our internal branch: that's still the easiest way for me to
test the PTX backend (with unloading support) at present, and it passes
libgomp testing that way. Trunk should be fairly close, but I haven't
tried applying it there yet.

The major changes are:

* The removal of the OpenACC-specific plugin hooks open_device,
  close_device, set_device_num and get_device_num. The functionality
  has been moved into the init/fini hooks (for the first two) or moved
  into the target-independent OpenACC parts, respectively.

* The PTX mkoffload utility has been extended to support variables as
  well as function mapping, to fill out support for the load/unload
  image hooks. (Not really tested so far!)

* The plugin hooks that are shared between OpenMP and OpenACC now
  support the device number argument properly: that should help with
  (eventually) unifying the plugin interface for the two APIs. (With
  set_device_num and get_device_num removed, the plugin is stateless
  with respect to which device is currently active. The rest of the
  OpenACC hooks -- async functions, etc. -- should probably be changed
  to take a device number argument too, but that could be a follow-on
  patch.)

* The limitation of having only one type of device active simultaneously
  in the OpenACC runtime has (theoretically!) been removed.

Thoughts?

Thanks,

Julian

ChangeLog

gcc/
* config/nvptx/mkoffload.c (process): Support variable mapping.

libgomp/
* libgomp.h (acc_dispatch_t): Remove open_device_func,
close_device_func, get_device_num_func, set_device_num_func,
target_data members. Change create_thread_data_func argument to
device number instead of generic pointer.
* oacc-async.c (assert.h): Include.
(acc_async_test, acc_async_test_all, acc_wait, acc_wait_async)
(acc_wait_all, acc_wait_all_async): Use current host thread's
active device, not base_dev.
* oacc-cuda.c (acc_get_current_cuda_device)
(acc_get_current_cuda_context, acc_get_cuda_stream)
(acc_set_cuda_stream): Likewise.
* oacc-host.c (host_dispatch): Don't set open_device_func,
close_device_func, get_device_num_func or set_device_num_func.
* oacc-init.c (base_dev, init_key): Remove.
(cached_base_dev): New.
(name_of_acc_device_t): New.
(acc_init_1): Initialise default-numbered device, not zeroth.
(acc_shutdown_1): Close all devices of a given type.
(goacc_destroy_thread): Don't use base_dev.
(lazy_open, lazy_init, lazy_init_and_open): Remove.
(goacc_attach_host_thread_to_device): New.
(acc_init): Reimplement with goacc_attach_host_thread_to_device.
(acc_get_num_devices): Don't use base_dev.
(acc_set_device_type): Reimplement.
(acc_get_device_type): Don't use base_dev.
(acc_get_device_num): Tweak 

Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-02-03 Thread Ilya Verbin
Hi Julian!

On 27 Jan 14:07, Julian Brown wrote:
 On Mon, 26 Jan 2015 17:34:26 +0300
 Ilya Verbin iver...@gmail.com wrote:
  Here is my current patch, it works for OpenMP-MIC, but obviously
  will not work for PTX, since it requires symmetrical changes in the
  plugin.  Could you please take a look, whether it is possible to
  support this new interface in PTX plugin?
 
 I think it can probably be made to work. I'll have a look in more
 detail.

Do you have any progress on this?

Thanks,
  -- Ilya


Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-02-03 Thread Julian Brown
On Tue, 3 Feb 2015 14:28:44 +0300
Ilya Verbin iver...@gmail.com wrote:

 Hi Julian!
 
 On 27 Jan 14:07, Julian Brown wrote:
  On Mon, 26 Jan 2015 17:34:26 +0300
  Ilya Verbin iver...@gmail.com wrote:
   Here is my current patch, it works for OpenMP-MIC, but obviously
   will not work for PTX, since it requires symmetrical changes in
   the plugin.  Could you please take a look, whether it is possible
   to support this new interface in PTX plugin?
  
  I think it can probably be made to work. I'll have a look in more
  detail.
 
 Do you have any progress on this?

I'm still working on a patch to update OpenACC support and the PTX
backend to use load/unload_image and to unify initialisation/opening.
So far I think the answer is basically yes, the new interface can be
supported, though I might request a minor tweak -- e.g. that
load_image takes an extra void ** argument so that a libgomp backend
can allocate a block of generic metadata relating to the image, then
that same block would be passed (void *) to the unload hook so the
backend can use it there and deallocate it when it's finished with.

Would that be possible? (It'd mostly be for a CUmodule handle: this
could be stashed away somewhere within the nvptx backend, but it might
be neater to put it in generic code since it'll probably be useful for
other backends anyway.)

Thanks,

Julian


Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-02-03 Thread Ilya Verbin
On 03 Feb 13:00, Julian Brown wrote:
 On Tue, 3 Feb 2015 14:28:44 +0300
 Ilya Verbin iver...@gmail.com wrote:
  On 27 Jan 14:07, Julian Brown wrote:
   On Mon, 26 Jan 2015 17:34:26 +0300
   Ilya Verbin iver...@gmail.com wrote:
Here is my current patch, it works for OpenMP-MIC, but obviously
will not work for PTX, since it requires symmetrical changes in
the plugin.  Could you please take a look, whether it is possible
to support this new interface in PTX plugin?
   
   I think it can probably be made to work. I'll have a look in more
   detail.
  
  Do you have any progress on this?
 
 I'm still working on a patch to update OpenACC support and the PTX
 backend to use load/unload_image and to unify initialisation/opening.
 So far I think the answer is basically yes, the new interface can be
 supported, though I might request a minor tweak -- e.g. that
 load_image takes an extra void ** argument so that a libgomp backend
 can allocate a block of generic metadata relating to the image, then
 that same block would be passed (void *) to the unload hook so the
 backend can use it there and deallocate it when it's finished with.
 
 Would that be possible? (It'd mostly be for a CUmodule handle: this
 could be stashed away somewhere within the nvptx backend, but it might
 be neater to put it in generic code since it'll probably be useful for
 other backends anyway.)

An extra argument is not a problem, however I don't quite get the idea.
PTX plugin allocates some data while loading, and needs this data while
unloading?  Then why not to create a hash table with image_ptr - metadata
mapping inside the plugin?  In this case, to the unload hook can deallocate
metadata using the image_ptr key.  Since this metadata is target-specific,
I believe it would be better to keep it inside the plugin.

  -- Ilya


Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-01-27 Thread Julian Brown
On Mon, 26 Jan 2015 17:34:26 +0300
Ilya Verbin iver...@gmail.com wrote:

 Here is my current patch, it works for OpenMP-MIC, but obviously
 will not work for PTX, since it requires symmetrical changes in the
 plugin.  Could you please take a look, whether it is possible to
 support this new interface in PTX plugin?

I think it can probably be made to work. I'll have a look in more
detail.

Thanks,

Julian


Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-01-27 Thread Jack Howarth
Thomas,
 Any plans to fix
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64635 soon? On x86_64
darwin, the OpenACC merge resulted a huge number of failures in the
libgomp test suite…


=== libgomp Summary ===
# of expected passes 10628
# of unexpected failures 724
# of unsupported tests 562

which are resolved with a fix similar to
https://gcc.gnu.org/bugzilla/attachment.cgi?id=34480.
   Jack


On Mon, Jan 26, 2015 at 8:44 AM, Thomas Schwinge
tho...@codesourcery.com wrote:
 Hi!

 Sorry for the late answer -- I've been on sick leave, and just now
 returning to work.  Julian, would you please have a look at the following
 issues?

   In r219682, I have committed to trunk our current set of OpenACC changes,
   which we had prepared on gomp-4_0-branch.  Thanks to everyone who has
   been contributing!

 On Fri, 23 Jan 2015 20:20:53 +0300, Ilya Verbin iver...@gmail.com wrote:
 On 17 Jan 02:16, Ilya Verbin wrote:
  Unfortunately, it broke offloading from shared libraries (I mean common 
  libs
  with NEEDED entries, not dlopened).

 Sorry for that!

  Such things are not covered by the
  testsuite, that's why you missed this issue.  Here is a simple testcase:

 http://news.gmane.org/find-root.php?message_id=%3C20150116231632.GB48380%40msticlxl57.ims.intel.com%3E

 Probably a good motivation for adding such a test case.  ;-)

  So, you don't assume that a device can have multiple images from multiple 
  libs?

 Ping?

 This probably is just a bug that we introduced with our changes?
 (Julian?)


 Also, could you please explain, why did you divide a device initialization 
 into
 two functions -- gomp_init_device and gomp_init_tables?

 As I understand it (again, Julian, please correct me if I got that
 wrong), the reason is that for OpenACC support, we need these as two
 separate (independent) actions.  Is this causing problems for OpenMP
 offloading?


 Currently I'm trying to rebase on trunk my old patch, which fixes offloading
 from dlopened libraries: 
 http://gcc.gnu.org/ml/gcc-patches/2014-11/msg01604.html
 It works for OpenMP and MIC, but I don't know how not to break OpenACC and 
 PTX.


 Grüße,
  Thomas


Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-01-27 Thread Julian Brown
On Mon, 26 Jan 2015 14:44:19 +0100
Thomas Schwinge tho...@codesourcery.com wrote:

  On 17 Jan 02:16, Ilya Verbin wrote:
   Unfortunately, it broke offloading from shared libraries (I mean
   common libs with NEEDED entries, not dlopened).
 
 Sorry for that!
 
   Such things are not covered by the
   testsuite, that's why you missed this issue.  Here is a simple
   testcase:
 
 http://news.gmane.org/find-root.php?message_id=%3C20150116231632.GB48380%40msticlxl57.ims.intel.com%3E
 
 Probably a good motivation for adding such a test case.  ;-)
 
   So, you don't assume that a device can have multiple images from
   multiple libs?
  
  Ping?
 
 This probably is just a bug that we introduced with our changes?
 (Julian?)

AFAICR, we haven't yet figured out how to make (shared) libraries work
with PTX. Actually I'm not entirely sure if static libraries containing
PTX code will work either. But, multiple images (e.g. from different
object files) are supported, via the loop in gomp_target_init.

(The semantics of gomp_register_image_for_device were changed, but not
-- intentionally! -- to limit the number of offloaded images to one.)

  Also, could you please explain, why did you divide a device
  initialization into two functions -- gomp_init_device and
  gomp_init_tables?
 
 As I understand it (again, Julian, please correct me if I got that
 wrong), the reason is that for OpenACC support, we need these as two
 separate (independent) actions.  Is this causing problems for OpenMP
 offloading?

This was certainly necessary at some point, when the support for
multiple devices of the same type in the OpenACC runtime was delegated
entirely to target-dependent code. Later (after one round of
refactoring), the gomp_device_descr and the memory map were still
separate, with the former possibly representing a number of devices,
and the latter having independent copies for each instance of a device.

That's largely been refactored (again) away now though -- a
gomp_device_descr and its memory map are stored together, per-device
instance. So this separation of their initialisation can probably go
away, although some (somewhat delicate) code in oacc-init.c would need
to be tweaked.

Julian


Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-01-26 Thread Thomas Schwinge
Hi!

Sorry for the late answer -- I've been on sick leave, and just now
returning to work.  Julian, would you please have a look at the following
issues?

   In r219682, I have committed to trunk our current set of OpenACC changes,
   which we had prepared on gomp-4_0-branch.  Thanks to everyone who has
   been contributing!

On Fri, 23 Jan 2015 20:20:53 +0300, Ilya Verbin iver...@gmail.com wrote:
 On 17 Jan 02:16, Ilya Verbin wrote:
  Unfortunately, it broke offloading from shared libraries (I mean common libs
  with NEEDED entries, not dlopened).

Sorry for that!

  Such things are not covered by the
  testsuite, that's why you missed this issue.  Here is a simple testcase:

http://news.gmane.org/find-root.php?message_id=%3C20150116231632.GB48380%40msticlxl57.ims.intel.com%3E

Probably a good motivation for adding such a test case.  ;-)

  So, you don't assume that a device can have multiple images from multiple 
  libs?
 
 Ping?

This probably is just a bug that we introduced with our changes?
(Julian?)


 Also, could you please explain, why did you divide a device initialization 
 into
 two functions -- gomp_init_device and gomp_init_tables?

As I understand it (again, Julian, please correct me if I got that
wrong), the reason is that for OpenACC support, we need these as two
separate (independent) actions.  Is this causing problems for OpenMP
offloading?


 Currently I'm trying to rebase on trunk my old patch, which fixes offloading
 from dlopened libraries: 
 http://gcc.gnu.org/ml/gcc-patches/2014-11/msg01604.html
 It works for OpenMP and MIC, but I don't know how not to break OpenACC and 
 PTX.


Grüße,
 Thomas


pgpNuLLIR31d6.pgp
Description: PGP signature


Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-01-26 Thread Ilya Verbin
On 26 Jan 14:44, Thomas Schwinge wrote:
 On 17 Jan 02:16, Ilya Verbin wrote:
  Such things are not covered by the
  testsuite, that's why you missed this issue.  Here is a simple testcase:
 
 http://news.gmane.org/find-root.php?message_id=%3C20150116231632.GB48380%40msticlxl57.ims.intel.com%3E
 
 Probably a good motivation for adding such a test case.  ;-)

I thought about it, but I don't know how to compile 2 binaries and run one of
them using dejagnu.

  So, you don't assume that a device can have multiple images from multiple 
  libs?
 
 This probably is just a bug that we introduced with our changes?
 (Julian?)
 
  Also, could you please explain, why did you divide a device initialization 
  into
  two functions -- gomp_init_device and gomp_init_tables?
 
 As I understand it (again, Julian, please correct me if I got that
 wrong), the reason is that for OpenACC support, we need these as two
 separate (independent) actions.  Is this causing problems for OpenMP
 offloading?

I'm asking since in this patch 
http://gcc.gnu.org/ml/gcc-patches/2014-11/msg01604.html
I tried to change libgomp-plugin interface to enable offloading from libs,
loaded at any time.
My proposal was to replace GOMP_OFFLOAD_register_image and
GOMP_OFFLOAD_get_table with GOMP_OFFLOAD_[un]load_image.
When target device is initialized, GOMP_OFFLOAD_load_image registers one image
in the plugin and returns corresponding target addresses for the image.
The mapping between host and target addresses happens as previously.
I hope that this approach is suitable for both MIC and PTX.

Here is my current patch, it works for OpenMP-MIC, but obviously will not work
for PTX, since it requires symmetrical changes in the plugin.  Could you please
take a look, whether it is possible to support this new interface in PTX plugin?


diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index d9cbff5..1072ae4 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -51,14 +51,12 @@ enum offload_target_type
   OFFLOAD_TARGET_TYPE_INTEL_MIC = 6
 };
 
-/* Auxiliary struct, used for transferring a host-target address range mapping
-   from plugin to libgomp.  */
-struct mapping_table
+/* Auxiliary struct, used for transferring pairs of addresses from plugin
+   to libgomp.  */
+struct addr_pair
 {
-  uintptr_t host_start;
-  uintptr_t host_end;
-  uintptr_t tgt_start;
-  uintptr_t tgt_end;
+  uintptr_t start;
+  uintptr_t end;
 };
 
 /* Miscellaneous functions.  */
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 3089401..4e021f9 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -773,10 +773,10 @@ struct gomp_device_descr
   unsigned int (*get_caps_func) (void);
   int (*get_type_func) (void);
   int (*get_num_devices_func) (void);
-  void (*register_image_func) (void *, void *);
   void (*init_device_func) (int);
   void (*fini_device_func) (int);
-  int (*get_table_func) (int, struct mapping_table **);
+  int (*load_image_func) (int, void *, struct addr_pair **);
+  void (*unload_image_func) (int, void *);
   void *(*alloc_func) (int, size_t);
   void (*free_func) (int, void *);
   void *(*dev2host_func) (int, void *, const void *, size_t);
@@ -793,9 +793,6 @@ struct gomp_device_descr
   /* Set to true when device is initialized.  */
   bool is_initialized;
 
-  /* True when offload regions have been registered with this device.  */
-  bool offload_regions_registered;
-
   /* OpenACC-specific data and functions.  */
   /* This is mutable because of its mutable data_environ and target_data
  members.  */
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index f44174e..2b2b953 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -231,6 +231,7 @@ GOMP_4.0 {
 GOMP_4.0.1 {
   global:
GOMP_offload_register;
+   GOMP_offload_unregister;
 } GOMP_4.0;
 
 OACC_2.0 {
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index 6aeb1e7..5d67c6c 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -43,10 +43,10 @@ static struct gomp_device_descr host_dispatch =
 .get_caps_func = GOMP_OFFLOAD_get_caps,
 .get_type_func = GOMP_OFFLOAD_get_type,
 .get_num_devices_func = GOMP_OFFLOAD_get_num_devices,
-.register_image_func = GOMP_OFFLOAD_register_image,
 .init_device_func = GOMP_OFFLOAD_init_device,
 .fini_device_func = GOMP_OFFLOAD_fini_device,
-.get_table_func = GOMP_OFFLOAD_get_table,
+.load_image_func = GOMP_OFFLOAD_load_image,
+.unload_image_func = GOMP_OFFLOAD_unload_image,
 .alloc_func = GOMP_OFFLOAD_alloc,
 .free_func = GOMP_OFFLOAD_free,
 .dev2host_func = GOMP_OFFLOAD_dev2host,
@@ -56,7 +56,6 @@ static struct gomp_device_descr host_dispatch =
 .mem_map.is_initialized = false,
 .mem_map.splay_tree.root = NULL,
 .is_initialized = false,
-.offload_regions_registered = false,
 
 .openacc = {
   .open_device_func = GOMP_OFFLOAD_openacc_open_device,
diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index 

Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-01-23 Thread Jakub Jelinek
On Fri, Jan 23, 2015 at 08:20:53PM +0300, Ilya Verbin wrote:
 On 17 Jan 02:16, Ilya Verbin wrote:
  Hi!
  
  Unfortunately, it broke offloading from shared libraries (I mean common libs
  with NEEDED entries, not dlopened).  Such things are not covered by the
  testsuite, that's why you missed this issue.  Here is a simple testcase:
  ... 
  So, you don't assume that a device can have multiple images from multiple 
  libs?
 
 Ping?
 
 Also, could you please explain, why did you divide a device initialization 
 into
 two functions -- gomp_init_device and gomp_init_tables?
 
 Currently I'm trying to rebase on trunk my old patch, which fixes offloading
 from dlopened libraries: 
 http://gcc.gnu.org/ml/gcc-patches/2014-11/msg01604.html
 It works for OpenMP and MIC, but I don't know how not to break OpenACC and 
 PTX.

There is also the problem that GOMP_offload_register doesn't use any
locking, so one thread could be in the middle of dlopening some shared
library and doing GOMP_offload_register in there, and another thread
calling gomp_target_init at the same time, so you could reference freed
memory if GOMP_offload_register had to reallocate etc.

Jakub


Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-01-23 Thread Ilya Verbin
On 17 Jan 02:16, Ilya Verbin wrote:
 Hi!
 
 Unfortunately, it broke offloading from shared libraries (I mean common libs
 with NEEDED entries, not dlopened).  Such things are not covered by the
 testsuite, that's why you missed this issue.  Here is a simple testcase:
 ... 
 So, you don't assume that a device can have multiple images from multiple 
 libs?

Ping?

Also, could you please explain, why did you divide a device initialization into
two functions -- gomp_init_device and gomp_init_tables?

Currently I'm trying to rebase on trunk my old patch, which fixes offloading
from dlopened libraries: http://gcc.gnu.org/ml/gcc-patches/2014-11/msg01604.html
It works for OpenMP and MIC, but I don't know how not to break OpenACC and PTX.

  -- Ilya


Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-01-16 Thread Uros Bizjak
Hello!

 In r219682, I have committed to trunk our current set of OpenACC changes,
 which we had prepared on gomp-4_0-branch.  Thanks to everyone who has
 been contributing!

This patch breaks bootstrap on alpha-linux-gnu:

/space/homedirs/uros/gcc-build/./gcc/xgcc
-B/space/homedirs/uros/gcc-build/./gcc/
-B/usr/local/alphaev68-unknown-linux-gnu/bin/
-B/usr/local/alphaev68-unknown-linux-gnu/lib/ -isystem
/usr/local/alphaev68-unknown-linux-gnu/include -isystem
/usr/local/alphaev68-unknown-linux-gnu/sys-include -DHAVE_CONFIG_H -I.
-I../../../gcc-svn/trunk/libgomp
-I../../../gcc-svn/trunk/libgomp/config/linux/alpha
-I../../../gcc-svn/trunk/libgomp/config/linux
-I../../../gcc-svn/trunk/libgomp/config/posix
-I../../../gcc-svn/trunk/libgomp
-I../../../gcc-svn/trunk/libgomp/../include -Wall -Werror -pthread
-ftls-model=initial-exec -g -O2 -mieee -MT oacc-init.lo -MD -MP -MF
.deps/oacc-init.Tpo -c ../../../gcc-svn/trunk/libgomp/oacc-init.c
-fPIC -DPIC -o .libs/oacc-init.o
../../../gcc-svn/trunk/libgomp/oacc-init.c: In function ‘acc_on_device’:
../../../gcc-svn/trunk/libgomp/oacc-init.c:571:10: internal compiler
error: Segmentation fault
   return __builtin_acc_on_device (dev);
  ^
0x120bf4c13 crash_signal
../../gcc-svn/trunk/gcc/toplev.c:381
0x1203fc55c expand_builtin_acc_on_device
../../gcc-svn/trunk/gcc/builtins.c:5933
0x120400063 expand_builtin(tree_node*, rtx_def*, rtx_def*, machine_mode, int)
../../gcc-svn/trunk/gcc/builtins.c:7087
0x12063b5b3 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
../../gcc-svn/trunk/gcc/expr.c:10488
0x12062cc93 expand_expr_real(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
../../gcc-svn/trunk/gcc/expr.c:8017
0x1206090bb expand_expr
../../gcc-svn/trunk/gcc/expr.h:254
0x12062e2a7 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
../../gcc-svn/trunk/gcc/expr.c:8248
0x12063ce2f expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
../../gcc-svn/trunk/gcc/expr.c:10779
0x12062cc93 expand_expr_real(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
../../gcc-svn/trunk/gcc/expr.c:8017
0x1206090bb expand_expr
../../gcc-svn/trunk/gcc/expr.h:254


(gdb) r
Starting program: /space/uros/gcc-build/gcc/cc1 -quiet -g -O2 -mieee oacc-init.i

Program received signal SIGSEGV, Segmentation fault.
0x0001203fc55c in expand_builtin_acc_on_device (exp=0x2b17688,
target=0x0) at ../../gcc-svn/trunk/gcc/builtins.c:5933
5933  if (!REG_P (target) || GET_MODE (target) != target_mode)
(gdb) p target
$1 = (rtx) 0x0
(gdb) list
5928#else
5929  v1 = GEN_INT (GOMP_DEVICE_NONE);
5930  v2 = GEN_INT (GOMP_DEVICE_HOST);
5931#endif
5932  machine_mode target_mode = TYPE_MODE (integer_type_node);
5933  if (!REG_P (target) || GET_MODE (target) != target_mode)
5934target = gen_reg_rtx (target_mode);
5935  emit_move_insn (target, const1_rtx);
5936  rtx_code_label *done_label = gen_label_rtx ();
5937  do_compare_rtx_and_jump (v, v1, EQ, false, v_mode, NULL_RTX,

target is NULL in expand_builtin_acc_on_device.

(I can open a PR with a preprocessed file if requested).

Uros.


Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-01-16 Thread Uros Bizjak
On Fri, Jan 16, 2015 at 11:39 AM, Uros Bizjak ubiz...@gmail.com wrote:
 Hello!

 In r219682, I have committed to trunk our current set of OpenACC changes,
 which we had prepared on gomp-4_0-branch.  Thanks to everyone who has
 been contributing!

 This patch breaks bootstrap on alpha-linux-gnu:

 target is NULL in expand_builtin_acc_on_device.

Following patch allows bootstrap to continue:

--cut here--
Index: builtins.c
===
--- builtins.c  (revision 219712)
+++ builtins.c  (working copy)
@@ -5930,7 +5930,7 @@ expand_builtin_acc_on_device (tree exp, rtx target
   v2 = GEN_INT (GOMP_DEVICE_HOST);
 #endif
   machine_mode target_mode = TYPE_MODE (integer_type_node);
-  if (!REG_P (target) || GET_MODE (target) != target_mode)
+  if (!target || !register_operand (target, target_mode))
 target = gen_reg_rtx (target_mode);
   emit_move_insn (target, const1_rtx);
   rtx_code_label *done_label = gen_label_rtx ();
--cut here--

Uros.


Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-01-16 Thread Jakub Jelinek
On Fri, Jan 16, 2015 at 11:46:35AM +0100, Uros Bizjak wrote:
 On Fri, Jan 16, 2015 at 11:39 AM, Uros Bizjak ubiz...@gmail.com wrote:
  Hello!
 
  In r219682, I have committed to trunk our current set of OpenACC changes,
  which we had prepared on gomp-4_0-branch.  Thanks to everyone who has
  been contributing!
 
  This patch breaks bootstrap on alpha-linux-gnu:
 
  target is NULL in expand_builtin_acc_on_device.
 
 Following patch allows bootstrap to continue:

Ok for trunk with proper ChangeLog entry.  Thanks.

 --- builtins.c  (revision 219712)
 +++ builtins.c  (working copy)
 @@ -5930,7 +5930,7 @@ expand_builtin_acc_on_device (tree exp, rtx target
v2 = GEN_INT (GOMP_DEVICE_HOST);
  #endif
machine_mode target_mode = TYPE_MODE (integer_type_node);
 -  if (!REG_P (target) || GET_MODE (target) != target_mode)
 +  if (!target || !register_operand (target, target_mode))
  target = gen_reg_rtx (target_mode);
emit_move_insn (target, const1_rtx);
rtx_code_label *done_label = gen_label_rtx ();
 --cut here--
 
 Uros.

Jakub


Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-01-16 Thread Gerald Pfeifer
Hi Thomas,

On Thursday 2015-01-15 21:20, Thomas Schwinge wrote:
 In r219682, I have committed to trunk our current set of OpenACC changes,
 which we had prepared on gomp-4_0-branch.  Thanks to everyone who has
 been contributing!

this breaks bootstrap on FreeBSD 8/amd64 from what I can tell:

libtool: compile:  /scratch/tmp/gerald/OBJ-0116-1138/./gcc/xgcc 
-B/scratch/tmp/gerald/OBJ-0116-1138/./gcc/ 
-B/home/gerald/gcc-ref8-amd64/x86_64-unknown-freebsd8.4/bin/ 
-B/home/gerald/gcc-ref8-amd64/x86_64-unknown-freebsd8.4/lib/ -isystem 
/home/gerald/gcc-ref8-amd64/x86_64-unknown-freebsd8.4/include -isystem 
/home/gerald/gcc-ref8-amd64/x86_64-unknown-freebsd8.4/sys-include 
-DHAVE_CONFIG_H -I. -I/sc ratch/tmp/gerald/gcc-HEAD/libgomp 
-I/scratch/tmp/gerald/gcc-HEAD/libgomp/config/ posix 
-I/scratch/tmp/gerald/gcc-HEAD/libgomp 
-I/scratch/tmp/gerald/gcc-HEAD/libg omp/../include -Wall -pthread 
-Werror -g -O2 -MT work.lo -MD -MP -MF .deps/work. Tpo -c 
/scratch/tmp/gerald/gcc-HEAD/libgomp/work.c -o work.o /dev/null 21 
/scratch/tmp/gerald/gcc-HEAD/libgomp/oacc-parallel.c:37:20: fatal error: 
alloca. h: No such file or directory compilation terminated.


% find /usr/include/ -name alloca.h
%

i.e., FreeBSD does not feature the alloca.h header and declares
alloca() in stdlib.h.

Gerald


Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-01-16 Thread Gerald Pfeifer
Once I work around the previous failure, I quickly get another one on
FreeBSD 8.4/amd64:

/scratch/tmp/gerald/gcc-HEAD/libgomp/target.c:67:12: error: 
\xe2\x80\x98num_devices\xe2\x80\x99 defined but not used 
[-Werror=unused-variable]
 static int num_devices;
^

This one did not require autoconf, so I went ahead and fixed it myself. 
;-)  Committed after successful bootstrap on x86_64-unknown-freebsd8.4 
(after hacking around the other failure).

Gerald


2015-01-16  Gerald Pfeifer  ger...@pfeifer.com

* target.c (num_devices): Guard with PLUGIN_SUPPORT.

Index: target.c
===
--- target.c(revision 219741)
+++ target.c(revision 219742)
@@ -63,8 +63,10 @@
 /* Array of descriptors for all available devices.  */
 static struct gomp_device_descr *devices;
 
+#ifdef PLUGIN_SUPPORT
 /* Total number of available devices.  */
 static int num_devices;
+#endif
 
 /* Number of GOMP_OFFLOAD_CAP_OPENMP_400 devices.  */
 static int num_devices_openmp;


Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-01-16 Thread Jakub Jelinek
On Thu, Jan 15, 2015 at 09:20:07PM +0100, Thomas Schwinge wrote:
   * builtin-types.def (BT_FN_VOID_INT_INT_VAR)
   (BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR_INT_INT_VAR)
   (BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR):
   New function types.

This broke bootstrap with --enable-languages=jit.  Fixed thusly, committed
as obvious:

2015-01-16  Jakub Jelinek  ja...@redhat.com

* jit-builtins.h (DEF_FUNCTION_TYPE_VAR_5): Fix spelling of
last argument.
(DEF_FUNCTION_TYPE_VAR_8, DEF_FUNCTION_TYPE_VAR_12): Define and
undef afterwards.
* jit-builtins.c (DEF_FUNCTION_TYPE_VAR_8, DEF_FUNCTION_TYPE_VAR_12):
Likewise.

--- gcc/jit/jit-builtins.h.jj   2015-01-14 11:00:22.0 +0100
+++ gcc/jit/jit-builtins.h  2015-01-16 15:35:13.440088390 +0100
@@ -45,7 +45,12 @@ enum jit_builtin_type
 #define DEF_FUNCTION_TYPE_VAR_2(NAME, RETURN, ARG1, ARG2) NAME,
 #define DEF_FUNCTION_TYPE_VAR_3(NAME, RETURN, ARG1, ARG2, ARG3) NAME,
 #define DEF_FUNCTION_TYPE_VAR_4(NAME, RETURN, ARG1, ARG2, ARG3, ARG4) NAME,
-#define DEF_FUNCTION_TYPE_VAR_5(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG6) \
+#define DEF_FUNCTION_TYPE_VAR_5(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \
+  NAME,
+#define DEF_FUNCTION_TYPE_VAR_8(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+   ARG6, ARG7, ARG8) NAME,
+#define DEF_FUNCTION_TYPE_VAR_12(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+ARG6, ARG7, ARG8, ARG9, ARG10, ARG11, ARG12) \
   NAME,
 #define DEF_POINTER_TYPE(NAME, TYPE) NAME,
 #include builtin-types.def
@@ -65,6 +70,8 @@ enum jit_builtin_type
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
+#undef DEF_FUNCTION_TYPE_VAR_8
+#undef DEF_FUNCTION_TYPE_VAR_12
 #undef DEF_POINTER_TYPE
   BT_LAST
 }; /* enum jit_builtin_type */
--- gcc/jit/jit-builtins.c.jj   2015-01-14 11:00:22.0 +0100
+++ gcc/jit/jit-builtins.c  2015-01-16 15:35:02.176276537 +0100
@@ -313,6 +313,15 @@ builtins_manager::make_type (enum jit_bu
   case ENUM: return make_fn_type (ENUM, RETURN, 1, 4, ARG1, ARG2, ARG3, 
ARG4);
 #define DEF_FUNCTION_TYPE_VAR_5(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \
   case ENUM: return make_fn_type (ENUM, RETURN, 1, 5, ARG1, ARG2, ARG3, 
ARG4, ARG5);
+#define DEF_FUNCTION_TYPE_VAR_8(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+   ARG6, ARG7, ARG8) \
+  case ENUM: return make_fn_type (ENUM, RETURN, 1, 8, ARG1, ARG2, ARG3, \
+ ARG4, ARG5, ARG6, ARG7, ARG8);
+#define DEF_FUNCTION_TYPE_VAR_12(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+ARG6, ARG7, ARG8, ARG9, ARG10, ARG11, ARG12) \
+  case ENUM: return make_fn_type (ENUM, RETURN, 1, 12, ARG1, ARG2, ARG3, \
+ ARG4, ARG5, ARG6, ARG7, ARG8, ARG9, \
+ ARG10, ARG11, ARG12);
 #define DEF_POINTER_TYPE(ENUM, TYPE) \
   case ENUM: return make_ptr_type (ENUM, TYPE);
 
@@ -334,6 +343,8 @@ builtins_manager::make_type (enum jit_bu
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
+#undef DEF_FUNCTION_TYPE_VAR_8
+#undef DEF_FUNCTION_TYPE_VAR_12
 #undef DEF_POINTER_TYPE
 
 default:


Jakub


Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-01-16 Thread David Malcolm
On Fri, 2015-01-16 at 15:39 +0100, Jakub Jelinek wrote:
 On Thu, Jan 15, 2015 at 09:20:07PM +0100, Thomas Schwinge wrote:
  * builtin-types.def (BT_FN_VOID_INT_INT_VAR)
  (BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR_INT_INT_VAR)
  
  (BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR):
  New function types.
 
 This broke bootstrap with --enable-languages=jit.  Fixed thusly, committed
 as obvious:
 
 2015-01-16  Jakub Jelinek  ja...@redhat.com
 
   * jit-builtins.h (DEF_FUNCTION_TYPE_VAR_5): Fix spelling of
   last argument.
   (DEF_FUNCTION_TYPE_VAR_8, DEF_FUNCTION_TYPE_VAR_12): Define and
   undef afterwards.
   * jit-builtins.c (DEF_FUNCTION_TYPE_VAR_8, DEF_FUNCTION_TYPE_VAR_12):
   Likewise.

Thanks!




Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-01-16 Thread Thomas Schwinge
Hi Gerald!

On Fri, 16 Jan 2015 13:32:10 +0100 (CET), Gerald Pfeifer ger...@pfeifer.com 
wrote:
 On Thursday 2015-01-15 21:20, Thomas Schwinge wrote:
  In r219682, I have committed to trunk our current set of OpenACC changes,
  which we had prepared on gomp-4_0-branch.  Thanks to everyone who has
  been contributing!
 
 this breaks bootstrap on FreeBSD 8/amd64 from what I can tell:

Sorry for that.  And, thanks for fixing the num_devices issue.

 /scratch/tmp/gerald/gcc-HEAD/libgomp/oacc-parallel.c:37:20: fatal error: 
 alloca. h: No such file or directory compilation terminated.
 
 
 % find /usr/include/ -name alloca.h
 %
 
 i.e., FreeBSD does not feature the alloca.h header and declares
 alloca() in stdlib.h.

The fix is simple enough; committed to trunk in r219771, as obvious:

commit a6f19a7c6b55f96d0c6dc65914857fc8e9b30aaf
Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4
Date:   Fri Jan 16 20:05:21 2015 +

libgomp: Don't use alloca.h.

libgomp/
* oacc-parallel.c: Don't include alloca.h.
(GOACC_parallel): Use gomp_alloca instead of alloca.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@219771 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog   | 5 +
 libgomp/oacc-parallel.c | 3 +--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git libgomp/ChangeLog libgomp/ChangeLog
index 7c106d4..065dfd4 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,3 +1,8 @@
+2015-01-16  Thomas Schwinge  tho...@codesourcery.com
+
+   * oacc-parallel.c: Don't include alloca.h.
+   (GOACC_parallel): Use gomp_alloca instead of alloca.
+
 2015-01-16  Gerald Pfeifer  ger...@pfeifer.com
 
* target.c (num_devices): Guard with PLUGIN_SUPPORT.
diff --git libgomp/oacc-parallel.c libgomp/oacc-parallel.c
index 6d5386b..b5e8060 100644
--- libgomp/oacc-parallel.c
+++ libgomp/oacc-parallel.c
@@ -34,7 +34,6 @@
 #include string.h
 #include stdarg.h
 #include assert.h
-#include alloca.h
 
 static int
 find_pset (int pos, size_t mapnum, unsigned short *kinds)
@@ -151,7 +150,7 @@ GOACC_parallel (int device, void (*fn) (void *), const void 
*offload_table,
   tgt = gomp_map_vars (acc_dev, mapnum, hostaddrs, NULL, sizes, kinds, true,
   false);
 
-  devaddrs = alloca (sizeof (void *) * mapnum);
+  devaddrs = gomp_alloca (sizeof (void *) * mapnum);
   for (i = 0; i  mapnum; i++)
 devaddrs[i] = (void *) (tgt-list[i]-tgt-tgt_start
+ tgt-list[i]-tgt_offset);


Grüße,
 Thomas


signature.asc
Description: PGP signature


[PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-01-16 Thread Thomas Schwinge
Hi!

On Thu, 15 Jan 2015 21:20:07 +0100, I wrote:
 In r219682, I have committed to trunk our current set of OpenACC changes,

Here is a patch to remove the __OFFLOAD_SYMBOL__ variable/formal
parameter, as discussed in https://gcc.gnu.org/PR64625.

But -- I now wonder whether that's actually the issue that has been
reported in the PR; doesn't that more look like a problem with the
__OFFLOAD_TABLE__ symbol defined in libgcc/offloadstuff.c, and used in
the mkoffload tools (such as gcc/config/i386/intelmic-mkoffload.c)?  Can
anyone guess what's going on?

Anyway, as discussed in https://gcc.gnu.org/PR64625, I'd like to commit
this patch either way, OK?

commit 4409d0129118479c1cd1adbcfa96316ac4e734b0
Author: Thomas Schwinge tho...@codesourcery.com
Date:   Fri Jan 16 20:12:12 2015 +0100

[PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter.

gcc/
* omp-low.c (offload_symbol_decl): Remove variable.
(get_offload_symbol_decl): Remove function.
(expand_omp_target): For BUILT_IN_GOMP_TARGET,
BUILT_IN_GOMP_TARGET_DATA, BUILT_IN_GOMP_TARGET_UPDATE pass NULL
instead of __OFFLOAD_TABLE__, for BUILT_IN_GOACC_DATA_START,
BUILT_IN_GOACC_ENTER_EXIT_DATA, BUILT_IN_GOACC_PARALLEL,
BUILT_IN_GOACC_UPDATE don't pass it at all.
libgomp/
* libgomp_g.h (GOACC_data_start, GOACC_enter_exit_data)
(GOACC_parallel, GOACC_update): Remove const_void *offload_table
formal parameter.  Update all users.
* target.c (GOMP_target, GOMP_target_data, GOMP_target_update):
Document unused formal parameter.
---
 gcc/omp-low.c   | 45 ++---
 libgomp/libgomp_g.h | 10 +-
 libgomp/oacc-parallel.c |  8 
 libgomp/target.c| 11 +--
 4 files changed, 32 insertions(+), 42 deletions(-)

diff --git gcc/omp-low.c gcc/omp-low.c
index b7bf338..1589310 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -340,30 +340,6 @@ oacc_max_threads (omp_context *ctx)
 /* Holds offload tables with decls.  */
 vectree, va_gc *offload_funcs, *offload_vars;
 
-/* Holds a decl for __OFFLOAD_TABLE__.  */
-static GTY(()) tree offload_symbol_decl;
-
-/* Get the __OFFLOAD_TABLE__ symbol.  */
-static tree
-get_offload_symbol_decl (void)
-{
-  if (!offload_symbol_decl)
-{
-  tree decl = build_decl (UNKNOWN_LOCATION, VAR_DECL,
- get_identifier (__OFFLOAD_TABLE__),
- ptr_type_node);
-  TREE_ADDRESSABLE (decl) = 1;
-  TREE_PUBLIC (decl) = 1;
-  DECL_EXTERNAL (decl) = 1;
-  DECL_WEAK (decl) = 1;
-  DECL_ATTRIBUTES (decl)
-   = tree_cons (get_identifier (weak),
-NULL_TREE, DECL_ATTRIBUTES (decl));
-  offload_symbol_decl = decl;
-}
-  return offload_symbol_decl;
-}
-
 /* Convenience function for calling scan_omp_1_op on tree operands.  */
 
 static inline tree
@@ -9119,16 +9095,31 @@ expand_omp_target (struct omp_region *region)
 }
 
   gimple g;
-  tree offload_table = get_offload_symbol_decl ();
   vectree *args;
   /* The maximum number used by any start_ix, without varargs.  */
-  unsigned int argcnt = 12;
+  unsigned int argcnt = 11;
 
   vec_alloc (args, argcnt);
   args-quick_push (device);
   if (offloaded)
 args-quick_push (build_fold_addr_expr (child_fn));
-  args-quick_push (build_fold_addr_expr (offload_table));
+  switch (start_ix)
+{
+case BUILT_IN_GOMP_TARGET:
+case BUILT_IN_GOMP_TARGET_DATA:
+case BUILT_IN_GOMP_TARGET_UPDATE:
+  /* This const void * is part of the current ABI, but we're not actually
+using it.  */
+  args-quick_push (build_zero_cst (ptr_type_node));
+  break;
+case BUILT_IN_GOACC_DATA_START:
+case BUILT_IN_GOACC_ENTER_EXIT_DATA:
+case BUILT_IN_GOACC_PARALLEL:
+case BUILT_IN_GOACC_UPDATE:
+  break;
+default:
+  gcc_unreachable ();
+}
   args-quick_push (t1);
   args-quick_push (t2);
   args-quick_push (t3);
diff --git libgomp/libgomp_g.h libgomp/libgomp_g.h
index c1e4e63..5e88d45 100644
--- libgomp/libgomp_g.h
+++ libgomp/libgomp_g.h
@@ -217,15 +217,15 @@ extern void GOMP_teams (unsigned int, unsigned int);
 
 /* oacc-parallel.c */
 
-extern void GOACC_data_start (int, const void *,
- size_t, void **, size_t *, unsigned short *);
+extern void GOACC_data_start (int, size_t, void **, size_t *,
+ unsigned short *);
 extern void GOACC_data_end (void);
-extern void GOACC_enter_exit_data (int, const void *, size_t, void **,
+extern void GOACC_enter_exit_data (int, size_t, void **,
   size_t *, unsigned short *, int, int, ...);
-extern void GOACC_parallel (int, void (*) (void *), const void *, size_t,
+extern void GOACC_parallel (int, void (*) (void *), size_t,
void **, size_t *, unsigned short *, int, int, int,
int, int, ...);

Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-01-16 Thread Andreas Schwab
FAIL: c-c++-common/goacc/acc_on_device-2-off.c  -std=c++98  scan-rtl-dump-times 
expand (call [^n]*acc_on_device 1
FAIL: c-c++-common/goacc/acc_on_device-2-off.c  -std=c++11  scan-rtl-dump-times 
expand (call [^n]*acc_on_device 1
FAIL: c-c++-common/goacc/acc_on_device-2-off.c  -std=c++14  scan-rtl-dump-times 
expand (call [^n]*acc_on_device 1
XPASS: c-c++-common/goacc/acc_on_device-2.c  -std=c++98  scan-rtl-dump-times 
expand (call [^n]*acc_on_device 0
XPASS: c-c++-common/goacc/acc_on_device-2.c  -std=c++11  scan-rtl-dump-times 
expand (call [^n]*acc_on_device 0
XPASS: c-c++-common/goacc/acc_on_device-2.c  -std=c++14  scan-rtl-dump-times 
expand (call [^n]*acc_on_device 0
FAIL: c-c++-common/goacc/acc_on_device-2-off.c scan-rtl-dump-times expand 
(call [^n]*acc_on_device 1
FAIL: gcc.dg/goacc/acc_on_device-1.c scan-rtl-dump-times expand (call 
[^n]*acc_on_device 4

You are making invalid assumptions about the form of a call pattern.

(call_insn 7 6 8 2 (set (reg:SI 0 %d0)
(call (mem:QI (reg/f:SI 33) [0 acc_on_device S1 A8])
(const_int 4 [0x4]))) 
/daten/aranym/gcc/gcc-20150116/gcc/testsuite/c-c++-common/goacc/acc_on_device-2-off.c:19
 -1
 (nil)
(nil))

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.


Re: [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-01-16 Thread Ilya Verbin
On 16 Jan 21:34, Thomas Schwinge wrote:
 On Thu, 15 Jan 2015 21:20:07 +0100, I wrote:
 Here is a patch to remove the __OFFLOAD_SYMBOL__ variable/formal
 parameter, as discussed in https://gcc.gnu.org/PR64625.
 
 But -- I now wonder whether that's actually the issue that has been
 reported in the PR; doesn't that more look like a problem with the
 __OFFLOAD_TABLE__ symbol defined in libgcc/offloadstuff.c, and used in
 the mkoffload tools (such as gcc/config/i386/intelmic-mkoffload.c)?  Can
 anyone guess what's going on?

Why do you think so?  __OFFLOAD_TABLE__ symbol lives in libgcc/offloadstuff.c
since November without regressions.

  -- Ilya


Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-01-16 Thread Ilya Verbin
Hi!

On 15 Jan 21:20, Thomas Schwinge wrote:
 In r219682, I have committed to trunk our current set of OpenACC changes,
 which we had prepared on gomp-4_0-branch.  Thanks to everyone who has
 been contributing!

Unfortunately, it broke offloading from shared libraries (I mean common libs
with NEEDED entries, not dlopened).  Such things are not covered by the
testsuite, that's why you missed this issue.  Here is a simple testcase:

+ test.c: +

int f_aaa (void);

int main ()
{
  int x = f_aaa ();
  #pragma omp target
x++;
  return x;
}

+ libaaa.c: +

int f_aaa (void)
{
  int x = 0;
  #pragma omp target
x = 10;
  return x;
}

++

$ gcc -fopenmp -shared -fPIC libaaa.c -o libaaa.so
$ gcc -fopenmp -L. -laaa test.c
$ ./a.out
libgomp: Target function wasn't mapped


The problem seems to be here:

-gomp_register_images_for_device (struct gomp_device_descr *device)
+gomp_register_image_for_device (struct gomp_device_descr *device,
+   struct offload_image_descr *image)
 {
-  int i;
-  for (i = 0; i  num_offload_images; i++)
+  if (!device-offload_regions_registered
+   (device-type == image-type
+ || device-type == OFFLOAD_TARGET_TYPE_HOST))
 {
-  struct offload_image_descr *image = offload_images[i];
-  if (image-type == device-type)
-   device-register_image_func (image-host_table, image-target_data);
+  device-register_image_func (image-host_table, image-target_data);
+  device-offload_regions_registered = true;
 }
 }

So, you don't assume that a device can have multiple images from multiple libs?

  -- Ilya


Re: [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-01-16 Thread Jack Howarth
On 86_64 Fedora 15, current gcc trunk only produces…

nm libgcc_s.so.1 | grep OFF
00215478 d _GLOBAL_OFFSET_TABLE_

and not __OFFLOAD_TABLE__,  The  libgcc_s.so.1 built on
x86_64-apple-darwin14 doesn't even contain the _GLOBAL_OFFSET_TABLE_
symbol.

On Fri, Jan 16, 2015 at 5:40 PM, Ilya Verbin iver...@gmail.com wrote:
 On 16 Jan 21:34, Thomas Schwinge wrote:
 On Thu, 15 Jan 2015 21:20:07 +0100, I wrote:
 Here is a patch to remove the __OFFLOAD_SYMBOL__ variable/formal
 parameter, as discussed in https://gcc.gnu.org/PR64625.

 But -- I now wonder whether that's actually the issue that has been
 reported in the PR; doesn't that more look like a problem with the
 __OFFLOAD_TABLE__ symbol defined in libgcc/offloadstuff.c, and used in
 the mkoffload tools (such as gcc/config/i386/intelmic-mkoffload.c)?  Can
 anyone guess what's going on?

 Why do you think so?  __OFFLOAD_TABLE__ symbol lives in libgcc/offloadstuff.c
 since November without regressions.

   -- Ilya


Re: [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-01-16 Thread Ilya Verbin
On 16 Jan 18:22, Jack Howarth wrote:
 On 86_64 Fedora 15, current gcc trunk only produces…
 
 nm libgcc_s.so.1 | grep OFF
 00215478 d _GLOBAL_OFFSET_TABLE_
 
 and not __OFFLOAD_TABLE__,  The  libgcc_s.so.1 built on
 x86_64-apple-darwin14 doesn't even contain the _GLOBAL_OFFSET_TABLE_
 symbol.
 
 On Fri, Jan 16, 2015 at 5:40 PM, Ilya Verbin iver...@gmail.com wrote:
  Why do you think so?  __OFFLOAD_TABLE__ symbol lives in 
  libgcc/offloadstuff.c
  since November without regressions.

That's correct.
1. offloadstuff.c isn't linked into libgcc_s.so.1
2. __OFFLOAD_TABLE__ is guarded with ENABLE_OFFLOADING, which is disabled in
default configuration.

  -- Ilya


Re: [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-01-16 Thread Jack Howarth
As I read https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64625#c3, the
requirement for  __OFFLOAD_TABLE__ was not longer present and the
residual usages of it just had to be removed. The weak symbol on
darwin is fragile and seems to trip up on the existing code which
produces undefined symbols for ___OFFLOAD_TABLE__...

# nm e.50.1.o | grep OFF
 U ___OFFLOAD_TABLE__

rather than

$ nm e.50.1.o | grep OFF
 w __OFFLOAD_TABLE__

for all of the test cases.

On Fri, Jan 16, 2015 at 6:30 PM, Ilya Verbin iver...@gmail.com wrote:
 On 16 Jan 18:22, Jack Howarth wrote:
 On 86_64 Fedora 15, current gcc trunk only produces…

 nm libgcc_s.so.1 | grep OFF
 00215478 d _GLOBAL_OFFSET_TABLE_

 and not __OFFLOAD_TABLE__,  The  libgcc_s.so.1 built on
 x86_64-apple-darwin14 doesn't even contain the _GLOBAL_OFFSET_TABLE_
 symbol.

 On Fri, Jan 16, 2015 at 5:40 PM, Ilya Verbin iver...@gmail.com wrote:
  Why do you think so?  __OFFLOAD_TABLE__ symbol lives in 
  libgcc/offloadstuff.c
  since November without regressions.

 That's correct.
 1. offloadstuff.c isn't linked into libgcc_s.so.1
 2. __OFFLOAD_TABLE__ is guarded with ENABLE_OFFLOADING, which is disabled in
 default configuration.

   -- Ilya


Re: [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-01-16 Thread Ilya Verbin
On 16 Jan 19:23, Jack Howarth wrote:
 As I read https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64625#c3, the
 requirement for  __OFFLOAD_TABLE__ was not longer present and the
 residual usages of it just had to be removed. The weak symbol on
 darwin is fragile and seems to trip up on the existing code which
 produces undefined symbols for ___OFFLOAD_TABLE__...
 
 # nm e.50.1.o | grep OFF
  U ___OFFLOAD_TABLE__
 
 rather than
 
 $ nm e.50.1.o | grep OFF
  w __OFFLOAD_TABLE__
 
 for all of the test cases.

I believe that the initial patch, which removes get_offload_symbol_decl, will
fix this.

  -- Ilya


Re: [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-01-16 Thread Jack Howarth
Confirmed that this patch eliminates

[Bug libgomp/64625] ___OFFLOAD_TABLE__ symbol not produced on x86_64 darwin

and thus exposes

[Bug libgomp/64635] New: darwin produces
libgomp-plugin-host_nonshm.1.dylib but tries to load
libgomp-plugin-host_nonshm.so.1

The additional hack (which should be fixed with configure/Makefile.
changes to detect SHLIBEXT)...

@@ -1055,7 +1054,7 @@ static void
 gomp_target_init (void)
 {
   const char *prefix =libgomp-plugin-;
-  const char *suffix = .so.1;
+  const char *suffix = .1.dylib;
   const char *cur, *next;
   char *plugin_name;

to target.c in libgomp eliminates the second bug.

Native configuration is x86_64-apple-darwin14.1.0

=== libgomp tests ===

Schedule of variations:
unix/-m32
unix/-m64

Running target unix/-m32
Using /sw/share/dejagnu/baseboards/unix.exp as board description file
for target.
Using /sw/share/dejagnu/config/unix.exp as generic interface file for target.
Using 
/sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/config/default.exp
as tool-and-target-specific interface file.
Running 
/sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.c/c.exp
...
Running 
/sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.c++/c++.exp
...
Running 
/sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.fortran/fortran.exp
...
Running 
/sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.graphite/graphite.exp
...
Running 
/sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.oacc-c/c.exp
...
Running 
/sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.oacc-c++/c++.exp
...
Running 
/sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
...

=== libgomp Summary for unix/-m32 ===

# of expected passes 5715
# of unsupported tests 281
Running target unix/-m64
Using /sw/share/dejagnu/baseboards/unix.exp as board description file
for target.
Using /sw/share/dejagnu/config/unix.exp as generic interface file for target.
Using 
/sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/config/default.exp
as tool-and-target-specific interface file.
Running 
/sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.c/c.exp
...
Running 
/sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.c++/c++.exp
...
Running 
/sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.fortran/fortran.exp
...
Running 
/sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.graphite/graphite.exp
...
Running 
/sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.oacc-c/c.exp
...
Running 
/sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.oacc-c++/c++.exp
...
Running 
/sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
...

=== libgomp Summary for unix/-m64 ===

# of expected passes 5715
# of unsupported tests 281

=== libgomp Summary ===

# of expected passes 11430
# of unsupported tests 562


On Fri, Jan 16, 2015 at 3:34 PM, Thomas Schwinge
tho...@codesourcery.com wrote:
 Hi!

 On Thu, 15 Jan 2015 21:20:07 +0100, I wrote:
 In r219682, I have committed to trunk our current set of OpenACC changes,

 Here is a patch to remove the __OFFLOAD_SYMBOL__ variable/formal
 parameter, as discussed in https://gcc.gnu.org/PR64625.

 But -- I now wonder whether that's actually the issue that has been
 reported in the PR; doesn't that more look like a problem with the
 __OFFLOAD_TABLE__ symbol defined in libgcc/offloadstuff.c, and used in
 the mkoffload tools (such as gcc/config/i386/intelmic-mkoffload.c)?  Can
 anyone guess what's going on?

 Anyway, as discussed in https://gcc.gnu.org/PR64625, I'd like to commit
 this patch either way, OK?

 commit 4409d0129118479c1cd1adbcfa96316ac4e734b0
 Author: Thomas Schwinge tho...@codesourcery.com
 Date:   Fri Jan 16 20:12:12 2015 +0100

 [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter.

 gcc/
 * omp-low.c (offload_symbol_decl): Remove variable.
 (get_offload_symbol_decl): Remove function.
 (expand_omp_target): For BUILT_IN_GOMP_TARGET,
 BUILT_IN_GOMP_TARGET_DATA, BUILT_IN_GOMP_TARGET_UPDATE pass NULL
 instead of __OFFLOAD_TABLE__, for BUILT_IN_GOACC_DATA_START,
 BUILT_IN_GOACC_ENTER_EXIT_DATA, BUILT_IN_GOACC_PARALLEL,
 BUILT_IN_GOACC_UPDATE don't pass it at all.
 libgomp/
 * libgomp_g.h (GOACC_data_start, GOACC_enter_exit_data)
 (GOACC_parallel, GOACC_update): Remove const_void *offload_table
 formal parameter.  Update all users.
 * target.c (GOMP_target, GOMP_target_data, GOMP_target_update):
 Document unused formal parameter.
 ---
  gcc/omp-low.c   | 45 ++---
  

Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-01-15 Thread Jeff Law

On 01/15/15 13:20, Thomas Schwinge wrote:

Hi!

In r219682, I have committed to trunk our current set of OpenACC changes,
which we had prepared on gomp-4_0-branch.  Thanks to everyone who has
been contributing!

Note that this is an experimental feature, incomplete, and subject to
change in future versions of GCC.  We shall update -- and keep updated --
https://gcc.gnu.org/wiki/OpenACC, to track the current status.  (Please
come back to that page in a few days, it has not yet been updated.)

Please note that there are still a handful of patches pending (posted
weeks ago, need to ping) that are needed for nvptx offloading, so that's
not yet functional.
Definitely ping them.  We're trying hard to get things closed down and 
knowing what's still out there is very important.


jeff



Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-01-15 Thread Tobias Burnus

Hi Thomas,

thanks to you and all others involved for the OpenACC merge.


Attached is a patch which converts for Fortran '%s' into %qs, as 
mentioned to before. (It wasn't possible when the original patch was 
reviewed as the common diagnostic patches came later.)


Committed as Rev. 219694.


On the Fortran side: Compared with C/C++, support for acc cache seems 
to be missing (PR63865, contains draft-patch link). (Other PRs: the 
device_resident clause is not supported (PR63859) - I don't know whether 
it is supported in C/C++ or not. And there are two ICEs (PR63865, 
PR63858).) – Is some work on those planed on your side for GCC 5 or more 
likely not?


Tobias


Thomas Schwinge wrote:

In r219682, I have committed to trunk our current set of OpenACC changes,
which we had prepared on gomp-4_0-branch.  Thanks to everyone who has
been contributing!

Note that this is an experimental feature, incomplete, and subject to
change in future versions of GCC.  We shall update -- and keep updated --
https://gcc.gnu.org/wiki/OpenACC, to track the current status.  (Please
come back to that page in a few days, it has not yet been updated.)

Please note that there are still a handful of patches pending (posted
weeks ago, need to ping) that are needed for nvptx offloading, so that's
not yet functional.

Here's the commit log.  The patch itself is too big to post inline, so
please find it attached, gzipped.


2015-01-15  Tobias Burnus  bur...@net-b.de

	* openmp.c (check_symbol_not_pointer, resolve_oacc_data_clauses,
	resolve_oacc_deviceptr_clause, resolve_omp_clauses,
	gfc_resolve_oacc_declare): Replace '%s' by %qs.

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 005739b..422e977 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -1457,7 +1457,7 @@ gfc_match_oacc_routine (void)
   if (!sym-attr.external  !sym-attr.function  !sym-attr.subroutine)
 {
   gfc_error (Syntax error in !$ACC ROUTINE ( NAME ) at %C, invalid
-		  function name '%s', sym-name);
+		  function name %qs, sym-name);
   gfc_current_locus = old_loc;
   return MATCH_ERROR;
 }
@@ -2649,29 +2649,29 @@ static void
 check_symbol_not_pointer (gfc_symbol *sym, locus loc, const char *name)
 {
   if (sym-ts.type == BT_DERIVED  sym-attr.pointer)
-gfc_error (POINTER object '%s' of derived type in %s clause at %L,
+gfc_error (POINTER object %qs of derived type in %s clause at %L,
 	   sym-name, name, loc);
   if (sym-ts.type == BT_DERIVED  sym-attr.cray_pointer)
-gfc_error (Cray pointer object of derived type '%s' in %s clause at %L,
+gfc_error (Cray pointer object of derived type %qs in %s clause at %L,
 	   sym-name, name, loc);
   if (sym-ts.type == BT_DERIVED  sym-attr.cray_pointee)
-gfc_error (Cray pointee object of derived type '%s' in %s clause at %L,
+gfc_error (Cray pointee object of derived type %qs in %s clause at %L,
 	   sym-name, name, loc);
 
   if ((sym-ts.type == BT_ASSUMED  sym-attr.pointer)
   || (sym-ts.type == BT_CLASS  CLASS_DATA (sym)
 	   CLASS_DATA (sym)-attr.pointer))
-gfc_error (POINTER object '%s' of polymorphic type in %s clause at %L,
+gfc_error (POINTER object %qs of polymorphic type in %s clause at %L,
 	   sym-name, name, loc);
   if ((sym-ts.type == BT_ASSUMED  sym-attr.cray_pointer)
   || (sym-ts.type == BT_CLASS  CLASS_DATA (sym)
 	   CLASS_DATA (sym)-attr.cray_pointer))
-gfc_error (Cray pointer object of polymorphic type '%s' in %s clause at %L,
+gfc_error (Cray pointer object of polymorphic type %qs in %s clause at %L,
 	   sym-name, name, loc);
   if ((sym-ts.type == BT_ASSUMED  sym-attr.cray_pointee)
   || (sym-ts.type == BT_CLASS  CLASS_DATA (sym)
 	   CLASS_DATA (sym)-attr.cray_pointee))
-gfc_error (Cray pointee object of polymorphic type '%s' in %s clause at %L,
+gfc_error (Cray pointee object of polymorphic type %qs in %s clause at %L,
 	   sym-name, name, loc);
 }
 
@@ -2681,14 +2681,14 @@ static void
 check_array_not_assumed (gfc_symbol *sym, locus loc, const char *name)
 {
   if (sym-as  sym-as-type == AS_ASSUMED_SIZE)
-gfc_error (Assumed size array '%s' in %s clause at %L,
+gfc_error (Assumed size array %qs in %s clause at %L,
 	   sym-name, name, loc);
   if (sym-as  sym-as-type == AS_ASSUMED_RANK)
-gfc_error (Assumed rank array '%s' in %s clause at %L,
+gfc_error (Assumed rank array %qs in %s clause at %L,
 	   sym-name, name, loc);
   if (sym-as  sym-as-type == AS_DEFERRED  sym-attr.pointer
!sym-attr.contiguous)
-gfc_error (Noncontiguous deferred shape array '%s' in %s clause at %L,
+gfc_error (Noncontiguous deferred shape array %qs in %s clause at %L,
 	   sym-name, name, loc);
 }
 
@@ -2696,12 +2696,12 @@ static void
 resolve_oacc_data_clauses (gfc_symbol *sym, locus loc, const char *name)
 {
   if (sym-ts.type == BT_DERIVED  sym-attr.allocatable)
-gfc_error (ALLOCATABLE object '%s' of derived type in %s clause at %L,
+gfc_error