Re: Merge current set of OpenACC changes from gomp-4_0-branch
On Mon, 20 Apr 2015, Thomas Schwinge wrote: Gerald, is it OK to commit the following to update GCC 5 changes' »New Languages and Language specific improvements« section? Sure thing. Gerald
Re: Merge current set of OpenACC changes from gomp-4_0-branch
Hi! On Thu, 15 Jan 2015 21:20:07 +0100, I wrote: In r219682, I have committed to trunk our current set of OpenACC changes, which we had prepared on gomp-4_0-branch. Thanks to everyone who has been contributing! Note that this is an experimental feature, incomplete, and subject to change in future versions of GCC. We shall update -- and keep updated -- https://gcc.gnu.org/wiki/OpenACC, to track the current status. (This has now happened, finally...) Gerald, is it OK to commit the following to update GCC 5 changes' »New Languages and Language specific improvements« section? Index: htdocs/gcc-5/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v retrieving revision 1.109 diff -u -p -r1.109 changes.html --- htdocs/gcc-5/changes.html 20 Apr 2015 08:22:35 - 1.109 +++ htdocs/gcc-5/changes.html 20 Apr 2015 14:20:54 - @@ -193,6 +193,12 @@ liCard emulator./li /ul /li +li id=openacc + GCC 5 includes a preliminary implementation of the OpenACC 2.0a + specification. OpenACC is intended for programming accelerator devices + such as GPUs. See a href=https://gcc.gnu.org/wiki/OpenACC;the OpenACC + wiki page/a for more information. +/li /ul Grüße, Thomas signature.asc Description: PGP signature
Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch) (PR65742)
On Tue, 14 Apr 2015 15:15:02 +0100 Julian Brown jul...@codesourcery.com wrote: On Wed, 8 Apr 2015 17:58:56 +0300 Ilya Verbin iver...@gmail.com wrote: On Wed, Apr 08, 2015 at 15:31:42 +0100, Julian Brown wrote: This version is mostly the same as the last posted version but has a tweak in GOACC_parallel to account for the new splay tree arrangement for target functions: - tgt_fn = (void (*)) tgt_fn_key-tgt-tgt_start; + tgt_fn = (void (*)) tgt_fn_key-tgt_offset; Have there been any other changes I might have missed? No. It passes libgomp testing on NVPTX. OK? Have you tested it with disabled offloading? I see several regressions: FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_on_device-1.c -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test I think there may be multiple issues here. The attached patch addresses one -- acc_device_type not distinguishing between offloaded and host code with the host_nonshm plugin. The patch appears to fix the original issue after all: I've re-run tests with host==target and the failures no longer appear. Also the same has been noted by Dominique d'Humieres in PR65742. The other problem is that it appears that the ACC_DEVICE_TYPE environment variable is not getting set properly on the target for (any of) the OpenACC tests: this means a lot of the time the wrong plugin is being tested, and means that the above tests (and several others) still fail. That will apparently need some more engineering (on our part). Fixing this turns out to require more DejaGNU-fu than I have: AFAICT, setting a per-test environment variable from an .exp file can't easily be done at present. The potentially useful-looking {dg-}set-target-env-var doesn't look quite suitable for this purpose, and besides which doesn't actually seem to be implemented for host != target anyway. (At least, if this fragment of gcc-dg.exp is anything to go by: if { [info exists set_target_env_var] \ [llength $set_target_env_var] != 0 } { if { [is_remote target] } { return [list unsupported ] } ... ). So: OK for trunk? Thanks, Julian ChangeLog libgomp/ * oacc-init.c (acc_on_device): Check whether we're in an offloaded region for host_nonshm plugin. * plugin/plugin-host.c (GOMP_OFFLOAD_openacc_parallel): Set nonshm_exec flag in thread-local data. (GOMP_OFFLOAD_openacc_create_thread_data): Allocate thread-local data for host_nonshm plugin. (+GOMP_OFFLOAD_openacc_destroy_thread_data): Free thread-local data for host_nonshm plugin. * plugin/plugin-host.h: New.
Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
(Not asking for review just yet, JFYI.) This is not a review! but the patch fixes PR65742. Dominique
Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
On Wed, 8 Apr 2015 17:58:56 +0300 Ilya Verbin iver...@gmail.com wrote: On Wed, Apr 08, 2015 at 15:31:42 +0100, Julian Brown wrote: This version is mostly the same as the last posted version but has a tweak in GOACC_parallel to account for the new splay tree arrangement for target functions: - tgt_fn = (void (*)) tgt_fn_key-tgt-tgt_start; + tgt_fn = (void (*)) tgt_fn_key-tgt_offset; Have there been any other changes I might have missed? No. It passes libgomp testing on NVPTX. OK? Have you tested it with disabled offloading? I see several regressions: FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_on_device-1.c -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test I think there may be multiple issues here. The attached patch addresses one -- acc_device_type not distinguishing between offloaded and host code with the host_nonshm plugin. The other problem is that it appears that the ACC_DEVICE_TYPE environment variable is not getting set properly on the target for (any of) the OpenACC tests: this means a lot of the time the wrong plugin is being tested, and means that the above tests (and several others) still fail. That will apparently need some more engineering (on our part). (Not asking for review just yet, JFYI.) Julian ChangeLog libgomp/ * oacc-init.c (acc_on_device): Check whether we're in an offloaded region for host_nonshm plugin. * plugin/plugin-host.c (GOMP_OFFLOAD_openacc_parallel): Set nonshm_exec flag in thread-local data. (GOMP_OFFLOAD_openacc_create_thread_data): Allocate thread-local data for host_nonshm plugin. (+GOMP_OFFLOAD_openacc_destroy_thread_data): Free thread-local data for host_nonshm plugin. * plugin/plugin-host.h: New.Index: libgomp/oacc-init.c === --- libgomp/oacc-init.c (revision 221922) +++ libgomp/oacc-init.c (working copy) @@ -29,6 +29,7 @@ #include libgomp.h #include oacc-int.h #include openacc.h +#include plugin/plugin-host.h #include assert.h #include stdlib.h #include strings.h @@ -548,7 +549,14 @@ ialias (acc_set_device_num) int acc_on_device (acc_device_t dev) { - if (acc_get_device_type () == acc_device_host_nonshm) + struct goacc_thread *thr = goacc_thread (); + + /* We only want to appear to be the host_nonshm plugin from offloaded + code -- i.e. within a parallel region. Test a flag set by the + openacc_parallel hook of the host_nonshm plugin to determine that. */ + if (acc_get_device_type () == acc_device_host_nonshm + thr thr-target_tls + ((struct nonshm_thread *)thr-target_tls)-nonshm_exec) return dev == acc_device_host_nonshm || dev == acc_device_not_host; /* Just rely on the compiler builtin. */ Index: libgomp/plugin/plugin-host.c === --- libgomp/plugin/plugin-host.c (revision 221922) +++ libgomp/plugin/plugin-host.c (working copy) @@ -44,6 +44,7 @@ #include stdlib.h #include string.h #include stdio.h +#include stdbool.h #ifdef HOST_NONSHM_PLUGIN #define STATIC @@ -55,6 +56,10 @@ #define SELF host: #endif +#ifdef HOST_NONSHM_PLUGIN +#include plugin-host.h +#endif + STATIC const char * GOMP_OFFLOAD_get_name (void) { @@ -174,7 +179,10 @@ GOMP_OFFLOAD_openacc_parallel (void (*fn void *targ_mem_desc __attribute__ ((unused))) { #ifdef HOST_NONSHM_PLUGIN + struct nonshm_thread *thd = GOMP_PLUGIN_acc_thread (); + thd-nonshm_exec = true; fn (devaddrs); + thd-nonshm_exec = false; #else fn (hostaddrs); #endif @@ -232,11 +240,20 @@ STATIC void * GOMP_OFFLOAD_openacc_create_thread_data (int ord __attribute__ ((unused))) { +#ifdef HOST_NONSHM_PLUGIN + struct nonshm_thread *thd += GOMP_PLUGIN_malloc (sizeof (struct nonshm_thread)); + thd-nonshm_exec = false; + return thd; +#else return NULL; +#endif } STATIC void -GOMP_OFFLOAD_openacc_destroy_thread_data (void *tls_data - __attribute__ ((unused))) +GOMP_OFFLOAD_openacc_destroy_thread_data (void *tls_data) { +#ifdef HOST_NONSHM_PLUGIN + free (tls_data); +#endif } Index: libgomp/plugin/plugin-host.h === --- libgomp/plugin/plugin-host.h (revision 0) +++ libgomp/plugin/plugin-host.h (revision 0) @@ -0,0 +1,37 @@ +/* OpenACC Runtime Library: acc_device_host, acc_device_host_nonshm. + + Copyright (C) 2015 Free Software Foundation, Inc. + + Contributed by Mentor Embedded. + + This file is part of the GNU Offloading and Multi Processing Library + (libgomp). + + Libgomp is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version.
Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
On Tue, 7 Apr 2015 17:26:45 +0200 Jakub Jelinek ja...@redhat.com wrote: On Mon, Apr 06, 2015 at 03:45:57PM +0300, Ilya Verbin wrote: On Wed, Apr 01, 2015 at 15:20:25 +0200, Jakub Jelinek wrote: LGTM with proper ChangeLog entry. I've commited this patch into trunk. Julian, you probably want to update the nvptx plugin. Note that as the number of P1s without posted fixes is now zero, it is likely RC1 will be done this week, so if you want nvptx working in GCC 5, please post a fix as soon as possible. This version is mostly the same as the last posted version but has a tweak in GOACC_parallel to account for the new splay tree arrangement for target functions: - tgt_fn = (void (*)) tgt_fn_key-tgt-tgt_start; + tgt_fn = (void (*)) tgt_fn_key-tgt_offset; Have there been any other changes I might have missed? It passes libgomp testing on NVPTX. OK? Thanks, Juliancommit ac06b5e25e170061bb9855b9ea4b8e5696816bf1 Author: Julian Brown jul...@codesourcery.com Date: Tue Apr 7 09:23:58 2015 -0700 NVPTX load/unload and init-rework patch. diff --git a/gcc/config/nvptx/mkoffload.c b/gcc/config/nvptx/mkoffload.c index 02c44b6..dbc68bc 100644 --- a/gcc/config/nvptx/mkoffload.c +++ b/gcc/config/nvptx/mkoffload.c @@ -839,6 +839,7 @@ process (FILE *in, FILE *out) { const char *input = read_file (in); Token *tok = tokenize (input); + unsigned int nvars = 0, nfuncs = 0; do tok = parse_file (tok); @@ -850,16 +851,17 @@ process (FILE *in, FILE *out) write_stmts (out, rev_stmts (fns)); fprintf (out, ;\n\n); fprintf (out, static const char *var_mappings[] = {\n); - for (id_map *id = var_ids; id; id = id-next) + for (id_map *id = var_ids; id; id = id-next, nvars++) fprintf (out, \t\%s\%s\n, id-ptx_name, id-next ? , : ); fprintf (out, };\n\n); fprintf (out, static const char *func_mappings[] = {\n); - for (id_map *id = func_ids; id; id = id-next) + for (id_map *id = func_ids; id; id = id-next, nfuncs++) fprintf (out, \t\%s\%s\n, id-ptx_name, id-next ? , : ); fprintf (out, };\n\n); fprintf (out, static const void *target_data[] = {\n); - fprintf (out, ptx_code, var_mappings, func_mappings\n); + fprintf (out, ptx_code, (void*) %u, var_mappings, (void*) %u, + func_mappings\n, nvars, nfuncs); fprintf (out, };\n\n); fprintf (out, extern void GOMP_offload_register (const void *, int, void *);\n); diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index a1d42c5..5272f01 100644 --- a/libgomp/libgomp.h +++ b/libgomp/libgomp.h @@ -655,9 +655,6 @@ struct target_mem_desc { /* Corresponding target device descriptor. */ struct gomp_device_descr *device_descr; - /* Memory mapping info for the thread that created this descriptor. */ - struct splay_tree_s *mem_map; - /* List of splay keys to remove (or decrease refcount) at the end of region. */ splay_tree_key list[]; @@ -691,18 +688,6 @@ typedef struct acc_dispatch_t /* This is guarded by the lock in the outer struct gomp_device_descr. */ struct target_mem_desc *data_environ; - /* Extra information required for a device instance by a given target. */ - /* This is guarded by the lock in the outer struct gomp_device_descr. */ - void *target_data; - - /* Open or close a device instance. */ - void *(*open_device_func) (int n); - int (*close_device_func) (void *h); - - /* Set or get the device number. */ - int (*get_device_num_func) (void); - void (*set_device_num_func) (int); - /* Execute. */ void (*exec_func) (void (*) (void *), size_t, void **, void **, size_t *, unsigned short *, int, int, int, int, void *); @@ -720,7 +705,7 @@ typedef struct acc_dispatch_t void (*async_set_async_func) (int); /* Create/destroy TLS data. */ - void *(*create_thread_data_func) (void *); + void *(*create_thread_data_func) (int); void (*destroy_thread_data_func) (void *); /* NVIDIA target specific routines. */ diff --git a/libgomp/oacc-async.c b/libgomp/oacc-async.c index 08b7c5e..1f5827e 100644 --- a/libgomp/oacc-async.c +++ b/libgomp/oacc-async.c @@ -26,7 +26,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see http://www.gnu.org/licenses/. */ - +#include assert.h #include openacc.h #include libgomp.h #include oacc-int.h @@ -37,13 +37,23 @@ acc_async_test (int async) if (async acc_async_sync) gomp_fatal (invalid async argument: %d, async); - return base_dev-openacc.async_test_func (async); + struct goacc_thread *thr = goacc_thread (); + + if (!thr || !thr-dev) +gomp_fatal (no device active); + + return thr-dev-openacc.async_test_func (async); } int acc_async_test_all (void) { - return base_dev-openacc.async_test_all_func (); + struct goacc_thread *thr = goacc_thread (); + + if (!thr || !thr-dev) +gomp_fatal (no device active); + + return thr-dev-openacc.async_test_all_func (); } void @@ -52,19 +62,34 @@ acc_wait (int async) if (async acc_async_sync)
Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
On Wed, Apr 08, 2015 at 15:31:42 +0100, Julian Brown wrote: This version is mostly the same as the last posted version but has a tweak in GOACC_parallel to account for the new splay tree arrangement for target functions: - tgt_fn = (void (*)) tgt_fn_key-tgt-tgt_start; + tgt_fn = (void (*)) tgt_fn_key-tgt_offset; Have there been any other changes I might have missed? No. It passes libgomp testing on NVPTX. OK? Have you tested it with disabled offloading? I see several regressions: FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_on_device-1.c -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test -- Ilya
Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
On Wed, Apr 08, 2015 at 03:31:42PM +0100, Julian Brown wrote: It passes libgomp testing on NVPTX. OK? Please write a proper ChangeLog entry for it. Ok with that. Jakub
Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
On Wed, 8 Apr 2015 17:58:56 +0300 Ilya Verbin iver...@gmail.com wrote: Have you tested it with disabled offloading? I see several regressions: FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_on_device-1.c -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test No -- thanks for the note. I've committed the patch now, but I'll try to get to looking at these in the next day or two (it's probably something relatively minor, I guess). Julian
Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
On Mon, Apr 06, 2015 at 03:45:57PM +0300, Ilya Verbin wrote: On Wed, Apr 01, 2015 at 15:20:25 +0200, Jakub Jelinek wrote: LGTM with proper ChangeLog entry. I've commited this patch into trunk. Julian, you probably want to update the nvptx plugin. Note that as the number of P1s without posted fixes is now zero, it is likely RC1 will be done this week, so if you want nvptx working in GCC 5, please post a fix as soon as possible. Jakub
Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
On Wed, Apr 01, 2015 at 15:20:25 +0200, Jakub Jelinek wrote: LGTM with proper ChangeLog entry. I've commited this patch into trunk. Julian, you probably want to update the nvptx plugin. gcc/ * config/i386/intelmic-mkoffload.c (generate_host_descr_file): Call GOMP_offload_unregister from the destructor. libgomp/ * libgomp-plugin.h (struct mapping_table): Replace with addr_pair. * libgomp.h (struct gomp_memory_mapping): Remove. (struct target_mem_desc): Change type of mem_map from gomp_memory_mapping * to splay_tree_s *. (struct gomp_device_descr): Remove register_image_func, get_table_func. Add load_image_func, unload_image_func. Change type of mem_map from gomp_memory_mapping to splay_tree_s. Remove offload_regions_registered. (gomp_init_tables): Remove. (gomp_free_memmap): Change type of argument from gomp_memory_mapping * to splay_tree_s *. * libgomp.map (GOMP_4.0.1): Add GOMP_offload_unregister. * oacc-host.c (host_dispatch): Do not initialize register_image_func, get_table_func, mem_map.is_initialized, mem_map.splay_tree.root, offload_regions_registered. Initialize load_image_func, unload_image_func, mem_map.root. (goacc_host_init): Do not initialize host_dispatch.mem_map.lock. * oacc-init.c (lazy_open): Don't call gomp_init_tables. (acc_shutdown_1): Use dev's lock and splay_tree instead of mem_map's. * oacc-mem.c (lookup_host): Get gomp_device_descr *dev instead of gomp_memory_mapping *. Use dev's lock and splay_tree. (lookup_dev): Use dev's lock. (acc_deviceptr): Pass dev to lookup_host instead of mem_map. (acc_is_present): Likewise. (acc_map_data): Likewise. (acc_unmap_data): Likewise. Use dev's lock. (present_create_copy): Likewise. (delete_copyout): Pass dev to lookup_host instead of mem_map. (update_dev_host): Likewise. (gomp_acc_remove_pointer): Likewise. Use dev's lock. * oacc-parallel.c (GOACC_parallel): Use dev's lock and splay_tree. * plugin/plugin-host.c (GOMP_OFFLOAD_register_image): Remove. (GOMP_OFFLOAD_get_table): Remove (GOMP_OFFLOAD_load_image): New function. (GOMP_OFFLOAD_unload_image): New function. * target.c (register_lock): New mutex for offload image registration. (num_devices): Do not guard with PLUGIN_SUPPORT. (gomp_realloc_unlock): New static function. (gomp_map_vars_existing): Add device descriptor argument. Unlock mutex before gomp_fatal. (gomp_map_vars): Use dev's lock and splay_tree instead of mem_map's. Pass devicep to gomp_map_vars_existing. Unlock mutex before gomp_fatal. (gomp_copy_from_async): Use dev's lock and splay_tree instead of mem_map's. (gomp_unmap_vars): Likewise. (gomp_update): Remove gomp_memory_mapping argument. Use dev's lock and splay_tree instead of mm's. Unlock mutex before gomp_fatal. (gomp_offload_image_to_device): New static function. (GOMP_offload_register): Add mutex lock. Call gomp_offload_image_to_device for all initialized devices. Replace gomp_realloc with gomp_realloc_unlock. (GOMP_offload_unregister): New function. (gomp_init_tables): Replace with gomp_init_device. Replace a call to get_table_func from the plugin with calls to init_device_func and gomp_offload_image_to_device. (gomp_free_memmap): Change type of argument from gomp_memory_mapping * to splay_tree_s *. (GOMP_target): Do not call gomp_init_tables. Use dev's lock and splay_tree instead of mem_map's. Unlock mutex before gomp_fatal. (GOMP_target_data): Do not call gomp_init_tables. (GOMP_target_update): Likewise. Remove argument from gomp_update. (gomp_load_plugin_for_device): Replace register_image and get_table with load_image and unload_image in DLSYM (). (gomp_register_images_for_device): Remove function. (gomp_target_init): Do not initialize current_device.mem_map.*, current_device.offload_regions_registered. Remove call to gomp_register_images_for_device. Do not free offload_images and num_offload_images. liboffloadmic/ * plugin/libgomp-plugin-intelmic.cpp: Include map. (AddrVect, DevAddrVect, ImgDevAddrMap): New typedefs. (num_devices, num_images, address_table): New static vars. (num_libraries, lib_descrs): Remove static vars. (set_mic_lib_path): Rename to ... (init): ... this. Allocate address_table and get num_devices. (GOMP_OFFLOAD_get_num_devices): return num_devices. (load_lib_and_get_table): Remove static function. (offload_image): New static function. (GOMP_OFFLOAD_get_table): Remove function.
Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
On Wed, Apr 01, 2015 at 07:21:47 +0200, Jakub Jelinek wrote: On Wed, Apr 01, 2015 at 02:53:28AM +0300, Ilya Verbin wrote: +/* Similar to gomp_fatal, but release mutexes before. */ + +static void +gomp_fatal_unlock (const char *fmt, ...) +{ + int i; + va_list list; + + for (i = 0; i num_devices; i++) +gomp_mutex_unlock (devices[i].lock); This is wrong. Calling gomp_mutex_unlock on a lock that you don't have locked is undefined behavior. You really should unlock it in the caller which should be aware which 0/1/2 locks it holds. I was worried about the following scenario: 1. Thread 1 in GOMP_target locks device 1. 2. Thread 2 in GOMP_target locks device 2 and calls gomp_fatal. 3. GOMP_offload_unregister will wait for device 1, even device 2 is unlocked. Anyway, it was a bad idea to unlock mutexes from non-owner thread. Here is patch, which unlocks proper mutexes in the caller, as you suggested. make check-target-libgomp passed. diff --git a/gcc/config/i386/intelmic-mkoffload.c b/gcc/config/i386/intelmic-mkoffload.c index f93007c..e101f93 100644 --- a/gcc/config/i386/intelmic-mkoffload.c +++ b/gcc/config/i386/intelmic-mkoffload.c @@ -350,14 +350,24 @@ generate_host_descr_file (const char *host_compiler) #ifdef __cplusplus\n extern \C\\n #endif\n - void GOMP_offload_register (void *, int, void *);\n\n + void GOMP_offload_register (void *, int, void *);\n + void GOMP_offload_unregister (void *, int, void *);\n\n __attribute__((constructor))\n static void\n init (void)\n {\n GOMP_offload_register (__OFFLOAD_TABLE__, %d, __offload_target_data);\n + }\n\n, GOMP_DEVICE_INTEL_MIC); + + fprintf (src_file, + __attribute__((destructor))\n + static void\n + fini (void)\n + {\n +GOMP_offload_unregister (__OFFLOAD_TABLE__, %d, __offload_target_data);\n }\n, GOMP_DEVICE_INTEL_MIC); + fclose (src_file); unsigned new_argc = 0; diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h index d9cbff5..1072ae4 100644 --- a/libgomp/libgomp-plugin.h +++ b/libgomp/libgomp-plugin.h @@ -51,14 +51,12 @@ enum offload_target_type OFFLOAD_TARGET_TYPE_INTEL_MIC = 6 }; -/* Auxiliary struct, used for transferring a host-target address range mapping - from plugin to libgomp. */ -struct mapping_table +/* Auxiliary struct, used for transferring pairs of addresses from plugin + to libgomp. */ +struct addr_pair { - uintptr_t host_start; - uintptr_t host_end; - uintptr_t tgt_start; - uintptr_t tgt_end; + uintptr_t start; + uintptr_t end; }; /* Miscellaneous functions. */ diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index 3089401..a1d42c5 100644 --- a/libgomp/libgomp.h +++ b/libgomp/libgomp.h @@ -224,7 +224,6 @@ struct gomp_team_state }; struct target_mem_desc; -struct gomp_memory_mapping; /* These are the OpenMP 4.0 Internal Control Variables described in section 2.3.1. Those described as having one copy per task are @@ -657,7 +656,7 @@ struct target_mem_desc { struct gomp_device_descr *device_descr; /* Memory mapping info for the thread that created this descriptor. */ - struct gomp_memory_mapping *mem_map; + struct splay_tree_s *mem_map; /* List of splay keys to remove (or decrease refcount) at the end of region. */ @@ -683,20 +682,6 @@ struct splay_tree_key_s { #include splay-tree.h -/* Information about mapped memory regions (per device/context). */ - -struct gomp_memory_mapping -{ - /* Mutex for operating with the splay tree and other shared structures. */ - gomp_mutex_t lock; - - /* True when tables have been added to this memory map. */ - bool is_initialized; - - /* Splay tree containing information about mapped memory regions. */ - struct splay_tree_s splay_tree; -}; - typedef struct acc_dispatch_t { /* This is a linked list of data mapped using the @@ -773,19 +758,18 @@ struct gomp_device_descr unsigned int (*get_caps_func) (void); int (*get_type_func) (void); int (*get_num_devices_func) (void); - void (*register_image_func) (void *, void *); void (*init_device_func) (int); void (*fini_device_func) (int); - int (*get_table_func) (int, struct mapping_table **); + int (*load_image_func) (int, void *, struct addr_pair **); + void (*unload_image_func) (int, void *); void *(*alloc_func) (int, size_t); void (*free_func) (int, void *); void *(*dev2host_func) (int, void *, const void *, size_t); void *(*host2dev_func) (int, void *, const void *, size_t); void (*run_func) (int, void *, void *); - /* Memory-mapping info for this device instance. */ - /* Uses a separate lock. */ - struct gomp_memory_mapping mem_map; + /* Splay tree containing information about mapped memory regions. */ + struct splay_tree_s mem_map; /* Mutex for the mutable data. */ gomp_mutex_t lock; @@
Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
On Wed, Apr 01, 2015 at 04:14:05PM +0300, Ilya Verbin wrote: On Wed, Apr 01, 2015 at 07:21:47 +0200, Jakub Jelinek wrote: On Wed, Apr 01, 2015 at 02:53:28AM +0300, Ilya Verbin wrote: +/* Similar to gomp_fatal, but release mutexes before. */ + +static void +gomp_fatal_unlock (const char *fmt, ...) +{ + int i; + va_list list; + + for (i = 0; i num_devices; i++) +gomp_mutex_unlock (devices[i].lock); This is wrong. Calling gomp_mutex_unlock on a lock that you don't have locked is undefined behavior. You really should unlock it in the caller which should be aware which 0/1/2 locks it holds. I was worried about the following scenario: 1. Thread 1 in GOMP_target locks device 1. 2. Thread 2 in GOMP_target locks device 2 and calls gomp_fatal. 3. GOMP_offload_unregister will wait for device 1, even device 2 is unlocked. How is that different from 1. Thread 1 in GOMP_target locks device 1. 2. Thread 2 calls exit. ? I mean when you unlock the device and register locks if you own them before gomp_fatal. Anyway, it was a bad idea to unlock mutexes from non-owner thread. Here is patch, which unlocks proper mutexes in the caller, as you suggested. make check-target-libgomp passed. LGTM with proper ChangeLog entry. Jakub
Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
On Wed, Apr 01, 2015 at 15:20:25 +0200, Jakub Jelinek wrote: On Wed, Apr 01, 2015 at 04:14:05PM +0300, Ilya Verbin wrote: I was worried about the following scenario: 1. Thread 1 in GOMP_target locks device 1. 2. Thread 2 in GOMP_target locks device 2 and calls gomp_fatal. 3. GOMP_offload_unregister will wait for device 1, even device 2 is unlocked. How is that different from 1. Thread 1 in GOMP_target locks device 1. 2. Thread 2 calls exit. ? I mean when you unlock the device and register locks if you own them before gomp_fatal. Yeah, it's the same situation. Here is patch, which unlocks proper mutexes in the caller, as you suggested. make check-target-libgomp passed. LGTM with proper ChangeLog entry. When should I commit it into trunk? Without the corresponding PTX part, offloading to PTX will not work. -- Ilya
Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
On Mon, Mar 30, 2015 at 22:42:51 +0100, Julian Brown wrote: On Mon, 30 Mar 2015 18:42:02 +0200 Jakub Jelinek ja...@redhat.com wrote: But the one Julian posted doesn't apply on top of your patch. If there is any interdiff needed on top of your patch, can it be posted against trunk + your patch? Here's a version of my patch against trunk and Ilya's latest patch (hopefully!). Tests look OK (libgomp + PTX). Thanks for rebasing! On Mon, Mar 30, 2015 at 18:42:02 +0200, Jakub Jelinek wrote: +/* Insert mapping of host - target address pairs to splay tree. */ + +static void +gomp_splay_tree_insert_mapping (struct gomp_device_descr *devicep, + struct addr_pair *host_addr, + struct addr_pair *tgt_addr) +{ + struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt)); + tgt-refcount = 1; + tgt-array = gomp_malloc (sizeof (*tgt-array)); + tgt-tgt_start = tgt_addr-start; + tgt-tgt_end = tgt_addr-end; + tgt-to_free = NULL; + tgt-list_count = 0; + tgt-device_descr = devicep; + splay_tree_node node = tgt-array; + splay_tree_key k = node-key; + k-host_start = host_addr-start; + k-host_end = host_addr-end; + k-tgt_offset = 0; + k-refcount = 1; + k-copy_from = false; + k-tgt = tgt; + node-left = NULL; + node-right = NULL; + splay_tree_insert (devicep-mem_map, node); +} What is the reason to register and allocate these one at a time, rather than using one struct target_mem_desc with one tgt-array for all splay tree nodes registered from one image? Perhaps you would just use tgt_start of 0 and tgt_end of 0 too (to make it clear it is special) and just use tgt_offset relative to that (i.e. absolute), but having to malloc each node individually and having to malloc a target_mem_desc for each one sounds expensive. Everything is freed just once anyway, isn't it? Here is WIP patch, does this look like what you suggested? It works fine with functions, however I'm not sure what to do with variables. Will gomp_map_vars work when tgt_start and tgt_end are equal to 0? @@ -654,6 +727,18 @@ void GOMP_offload_register (void *host_table, enum offload_target_type target_type, void *target_data) { + int i; + gomp_mutex_lock (register_lock); + + /* Load image to all initialized devices. */ + for (i = 0; i num_devices; i++) +{ + struct gomp_device_descr *devicep = devices[i]; + if (devicep-type == target_type devicep-is_initialized) + gomp_offload_image_to_device (devicep, host_table, target_data); Shouldn't either this function, or gomp_offload_image_to_device lock also devicep-lock mutex and unlock at the end? Where exactly I guess depends on if the devicep-* hook calls should be guarded with the mutex or not. If yes, it should be this function and gomp_init_device. + if (devicep-type != target_type || !devicep-is_initialized) + continue; + Similarly. I've added lock/unlock to GOMP_offload_register and GOMP_offload_unregister. All calls to gomp_init_device were already guarded. + devicep-unload_image_func (devicep-target_id, target_data); + + /* Remove mapping from splay tree. */ + for (j = 0; j num_funcs; j++) + { + struct splay_tree_key_s k; + k.host_start = (uintptr_t) host_func_table[j]; + k.host_end = k.host_start + 1; + splay_tree_remove (devicep-mem_map, k); + } + + for (j = 0; j num_vars; j++) + { + struct splay_tree_key_s k; + k.host_start = (uintptr_t) host_var_table[j*2]; + k.host_end = k.host_start + (uintptr_t) host_var_table[j*2+1]; + splay_tree_remove (devicep-mem_map, k); + } +} Aren't you leaking here all the tgt-array and tgt allocations here? Though, if you change it to just two allocations (one tgt, one array), you'd need to free just once. You're right. I've fixed this for functions, variables are WIP. diff --git a/gcc/config/i386/intelmic-mkoffload.c b/gcc/config/i386/intelmic-mkoffload.c index f93007c..e101f93 100644 --- a/gcc/config/i386/intelmic-mkoffload.c +++ b/gcc/config/i386/intelmic-mkoffload.c @@ -350,14 +350,24 @@ generate_host_descr_file (const char *host_compiler) #ifdef __cplusplus\n extern \C\\n #endif\n - void GOMP_offload_register (void *, int, void *);\n\n + void GOMP_offload_register (void *, int, void *);\n + void GOMP_offload_unregister (void *, int, void *);\n\n __attribute__((constructor))\n static void\n init (void)\n {\n GOMP_offload_register (__OFFLOAD_TABLE__, %d, __offload_target_data);\n + }\n\n, GOMP_DEVICE_INTEL_MIC); + + fprintf (src_file, + __attribute__((destructor))\n + static void\n + fini (void)\n + {\n +GOMP_offload_unregister (__OFFLOAD_TABLE__,
Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
On Tue, Mar 31, 2015 at 03:52:06PM +0300, Ilya Verbin wrote: What is the reason to register and allocate these one at a time, rather than using one struct target_mem_desc with one tgt-array for all splay tree nodes registered from one image? Perhaps you would just use tgt_start of 0 and tgt_end of 0 too (to make it clear it is special) and just use tgt_offset relative to that (i.e. absolute), but having to malloc each node individually and having to malloc a target_mem_desc for each one sounds expensive. Everything is freed just once anyway, isn't it? Here is WIP patch, does this look like what you suggested? It works fine with functions, however I'm not sure what to do with variables. Will gomp_map_vars work when tgt_start and tgt_end are equal to 0? Can you explain what you are afraid of? The mapped images (both their mapping and unmapping) are done in pairs, and in a valid program the addresses shouldn't be already mapped when the image is mapped in etc. So, for gomp_map_vars, the var allocations should just be the pre-existing mappings, i.e. splay_tree_key n = splay_tree_lookup (mm-splay_tree, cur_node); if (n) { tgt-list[i] = n; gomp_map_vars_existing (n, cur_node, kind typemask); } case and if (is_target) { for (i = 0; i mapnum; i++) { if (tgt-list[i] == NULL) cur_node.tgt_offset = (uintptr_t) NULL; else cur_node.tgt_offset = tgt-list[i]-tgt-tgt_start + tgt-list[i]-tgt_offset; /* FIXME: see above FIXME comment. */ devicep-host2dev_func (devicep-target_id, (void *) (tgt-tgt_start + i * sizeof (void *)), (void *) cur_node.tgt_offset, sizeof (void *)); } } at the end. tgt-list[i] will be non-NULL, tgt-list[i]-tgt-tgt_start will be 0, but tgt-list[i]-tgt_offset will be absolute and so should DTRT. + for (i = 0; i num_vars; i++) +{ + struct addr_pair host_addr; + host_addr.start = (uintptr_t) host_var_table[i*2]; + host_addr.end = host_addr.start + (uintptr_t) host_var_table[i*2+1]; Formatting, spaces around + or *. But, as said earlier, I don't see why this wouldn't work for variables too. Jakub
Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
On Wed, Apr 01, 2015 at 02:53:28AM +0300, Ilya Verbin wrote: +/* Similar to gomp_fatal, but release mutexes before. */ + +static void +gomp_fatal_unlock (const char *fmt, ...) +{ + int i; + va_list list; + + for (i = 0; i num_devices; i++) +gomp_mutex_unlock (devices[i].lock); This is wrong. Calling gomp_mutex_unlock on a lock that you don't have locked is undefined behavior. You really should unlock it in the caller which should be aware which 0/1/2 locks it holds. + gomp_mutex_unlock (register_lock); Jakub
Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
On Tue, Mar 31, 2015 at 15:07:58 +0200, Jakub Jelinek wrote: On Tue, Mar 31, 2015 at 03:52:06PM +0300, Ilya Verbin wrote: What is the reason to register and allocate these one at a time, rather than using one struct target_mem_desc with one tgt-array for all splay tree nodes registered from one image? Perhaps you would just use tgt_start of 0 and tgt_end of 0 too (to make it clear it is special) and just use tgt_offset relative to that (i.e. absolute), but having to malloc each node individually and having to malloc a target_mem_desc for each one sounds expensive. Everything is freed just once anyway, isn't it? Here is WIP patch, does this look like what you suggested? It works fine with functions, however I'm not sure what to do with variables. Will gomp_map_vars work when tgt_start and tgt_end are equal to 0? Can you explain what you are afraid of? The mapped images (both their mapping and unmapping) are done in pairs, and in a valid program the addresses shouldn't be already mapped when the image is mapped in etc. So, for gomp_map_vars, the var allocations should just be the pre-existing mappings, i.e. splay_tree_key n = splay_tree_lookup (mm-splay_tree, cur_node); if (n) { tgt-list[i] = n; gomp_map_vars_existing (n, cur_node, kind typemask); } case and if (is_target) { for (i = 0; i mapnum; i++) { if (tgt-list[i] == NULL) cur_node.tgt_offset = (uintptr_t) NULL; else cur_node.tgt_offset = tgt-list[i]-tgt-tgt_start + tgt-list[i]-tgt_offset; /* FIXME: see above FIXME comment. */ devicep-host2dev_func (devicep-target_id, (void *) (tgt-tgt_start + i * sizeof (void *)), (void *) cur_node.tgt_offset, sizeof (void *)); } } at the end. tgt-list[i] will be non-NULL, tgt-list[i]-tgt-tgt_start will be 0, but tgt-list[i]-tgt_offset will be absolute and so should DTRT. Ok, thanks for the clarification! Here is the new patch with variables. Unfortunately I see 4 fails in make check-target-libgomp with PTX patch applied on top, but with disabled offloading to PTX. Julian, have you seen them? All other tests passed with intelmic emul. FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_on_device-1.c -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/acc_on_device-1.c -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/if-1.c -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test acc_on_device-1.c aborts here: /* Offloaded. */ #pragma acc parallel { if (acc_on_device (acc_device_none)) abort (); diff --git a/gcc/config/i386/intelmic-mkoffload.c b/gcc/config/i386/intelmic-mkoffload.c index f93007c..e101f93 100644 --- a/gcc/config/i386/intelmic-mkoffload.c +++ b/gcc/config/i386/intelmic-mkoffload.c @@ -350,14 +350,24 @@ generate_host_descr_file (const char *host_compiler) #ifdef __cplusplus\n extern \C\\n #endif\n - void GOMP_offload_register (void *, int, void *);\n\n + void GOMP_offload_register (void *, int, void *);\n + void GOMP_offload_unregister (void *, int, void *);\n\n __attribute__((constructor))\n static void\n init (void)\n {\n GOMP_offload_register (__OFFLOAD_TABLE__, %d, __offload_target_data);\n + }\n\n, GOMP_DEVICE_INTEL_MIC); + + fprintf (src_file, + __attribute__((destructor))\n + static void\n + fini (void)\n + {\n +GOMP_offload_unregister (__OFFLOAD_TABLE__, %d, __offload_target_data);\n }\n, GOMP_DEVICE_INTEL_MIC); + fclose (src_file); unsigned new_argc = 0; diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h index d9cbff5..1072ae4 100644 --- a/libgomp/libgomp-plugin.h +++ b/libgomp/libgomp-plugin.h @@ -51,14 +51,12 @@ enum offload_target_type OFFLOAD_TARGET_TYPE_INTEL_MIC = 6 }; -/* Auxiliary struct, used for transferring a host-target address range mapping - from plugin to libgomp. */ -struct mapping_table +/* Auxiliary struct, used for transferring pairs of addresses from plugin + to libgomp. */ +struct addr_pair { - uintptr_t host_start; - uintptr_t host_end; - uintptr_t tgt_start; - uintptr_t tgt_end; + uintptr_t start; + uintptr_t end; }; /* Miscellaneous functions. */ diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index 3089401..a1d42c5 100644 ---
Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
On Tue, Mar 31, 2015 at 09:25:26PM +0300, Ilya Verbin wrote: On Mon, Mar 30, 2015 at 18:42:02 +0200, Jakub Jelinek wrote: Shouldn't either this function, or gomp_offload_image_to_device lock also devicep-lock mutex and unlock at the end? Where exactly I guess depends on if the devicep-* hook calls should be guarded with the mutex or not. If yes, it should be this function and gomp_init_device. + if (devicep-type != target_type || !devicep-is_initialized) + continue; + Similarly. Oops, there is a deadlock. E.g. if gomp_map_vars locks devicep-lock and then calls gomp_fatal, the destructors from .fini section are executed, so gomp_mutex_lock in GOMP_offload_unregister will wait for devicep-lock. Thus perhaps before calling gomp_fatal you should release the device lock (if held) and register_lock (ditto). Jakub
Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
On Mon, Mar 30, 2015 at 18:42:02 +0200, Jakub Jelinek wrote: Shouldn't either this function, or gomp_offload_image_to_device lock also devicep-lock mutex and unlock at the end? Where exactly I guess depends on if the devicep-* hook calls should be guarded with the mutex or not. If yes, it should be this function and gomp_init_device. + if (devicep-type != target_type || !devicep-is_initialized) + continue; + Similarly. Oops, there is a deadlock. E.g. if gomp_map_vars locks devicep-lock and then calls gomp_fatal, the destructors from .fini section are executed, so gomp_mutex_lock in GOMP_offload_unregister will wait for devicep-lock. -- Ilya
Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
On Tue, Mar 31, 2015 at 19:10:36 +0300, Ilya Verbin wrote: Ok, thanks for the clarification! Here is the new patch with variables. Unfortunately I see 4 fails in make check-target-libgomp with PTX patch applied on top, but with disabled offloading to PTX. Julian, have you seen them? All other tests passed with intelmic emul. FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_on_device-1.c -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/acc_on_device-1.c -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/if-1.c -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test acc_on_device-1.c aborts here: /* Offloaded. */ #pragma acc parallel { if (acc_on_device (acc_device_none)) abort (); And here is the next version with fixed potential deadlock in GOMP_offload_unregister. make check-target-libgomp also passed. (but with PTX patch make check-target-libgomp has several fails mentioned above) diff --git a/gcc/config/i386/intelmic-mkoffload.c b/gcc/config/i386/intelmic-mkoffload.c index f93007c..e101f93 100644 --- a/gcc/config/i386/intelmic-mkoffload.c +++ b/gcc/config/i386/intelmic-mkoffload.c @@ -350,14 +350,24 @@ generate_host_descr_file (const char *host_compiler) #ifdef __cplusplus\n extern \C\\n #endif\n - void GOMP_offload_register (void *, int, void *);\n\n + void GOMP_offload_register (void *, int, void *);\n + void GOMP_offload_unregister (void *, int, void *);\n\n __attribute__((constructor))\n static void\n init (void)\n {\n GOMP_offload_register (__OFFLOAD_TABLE__, %d, __offload_target_data);\n + }\n\n, GOMP_DEVICE_INTEL_MIC); + + fprintf (src_file, + __attribute__((destructor))\n + static void\n + fini (void)\n + {\n +GOMP_offload_unregister (__OFFLOAD_TABLE__, %d, __offload_target_data);\n }\n, GOMP_DEVICE_INTEL_MIC); + fclose (src_file); unsigned new_argc = 0; diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h index d9cbff5..1072ae4 100644 --- a/libgomp/libgomp-plugin.h +++ b/libgomp/libgomp-plugin.h @@ -51,14 +51,12 @@ enum offload_target_type OFFLOAD_TARGET_TYPE_INTEL_MIC = 6 }; -/* Auxiliary struct, used for transferring a host-target address range mapping - from plugin to libgomp. */ -struct mapping_table +/* Auxiliary struct, used for transferring pairs of addresses from plugin + to libgomp. */ +struct addr_pair { - uintptr_t host_start; - uintptr_t host_end; - uintptr_t tgt_start; - uintptr_t tgt_end; + uintptr_t start; + uintptr_t end; }; /* Miscellaneous functions. */ diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index 3089401..a1d42c5 100644 --- a/libgomp/libgomp.h +++ b/libgomp/libgomp.h @@ -224,7 +224,6 @@ struct gomp_team_state }; struct target_mem_desc; -struct gomp_memory_mapping; /* These are the OpenMP 4.0 Internal Control Variables described in section 2.3.1. Those described as having one copy per task are @@ -657,7 +656,7 @@ struct target_mem_desc { struct gomp_device_descr *device_descr; /* Memory mapping info for the thread that created this descriptor. */ - struct gomp_memory_mapping *mem_map; + struct splay_tree_s *mem_map; /* List of splay keys to remove (or decrease refcount) at the end of region. */ @@ -683,20 +682,6 @@ struct splay_tree_key_s { #include splay-tree.h -/* Information about mapped memory regions (per device/context). */ - -struct gomp_memory_mapping -{ - /* Mutex for operating with the splay tree and other shared structures. */ - gomp_mutex_t lock; - - /* True when tables have been added to this memory map. */ - bool is_initialized; - - /* Splay tree containing information about mapped memory regions. */ - struct splay_tree_s splay_tree; -}; - typedef struct acc_dispatch_t { /* This is a linked list of data mapped using the @@ -773,19 +758,18 @@ struct gomp_device_descr unsigned int (*get_caps_func) (void); int (*get_type_func) (void); int (*get_num_devices_func) (void); - void (*register_image_func) (void *, void *); void (*init_device_func) (int); void (*fini_device_func) (int); - int (*get_table_func) (int, struct mapping_table **); + int (*load_image_func) (int, void *, struct addr_pair **); + void (*unload_image_func) (int, void *); void *(*alloc_func) (int, size_t); void (*free_func) (int, void *); void *(*dev2host_func) (int, void *, const void *, size_t); void *(*host2dev_func) (int, void *, const void *, size_t); void (*run_func) (int, void *, void *); - /* Memory-mapping info for this device instance. */
Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
On Thu, Mar 26, 2015 at 11:41:30PM +0300, Ilya Verbin wrote: Here is the latest patch for libgomp and mic plugin. make check-target-libgomp using intelmic emul passed. Also I used a testcase from the attachment. This applies cleanly. Latest ptx part is here, I guess: https://gcc.gnu.org/ml/gcc-patches/2015-02/msg01407.html But the one Julian posted doesn't apply on top of your patch. If there is any interdiff needed on top of your patch, can it be posted against trunk + your patch? +/* Insert mapping of host - target address pairs to splay tree. */ + +static void +gomp_splay_tree_insert_mapping (struct gomp_device_descr *devicep, + struct addr_pair *host_addr, + struct addr_pair *tgt_addr) +{ + struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt)); + tgt-refcount = 1; + tgt-array = gomp_malloc (sizeof (*tgt-array)); + tgt-tgt_start = tgt_addr-start; + tgt-tgt_end = tgt_addr-end; + tgt-to_free = NULL; + tgt-list_count = 0; + tgt-device_descr = devicep; + splay_tree_node node = tgt-array; + splay_tree_key k = node-key; + k-host_start = host_addr-start; + k-host_end = host_addr-end; + k-tgt_offset = 0; + k-refcount = 1; + k-copy_from = false; + k-tgt = tgt; + node-left = NULL; + node-right = NULL; + splay_tree_insert (devicep-mem_map, node); +} What is the reason to register and allocate these one at a time, rather than using one struct target_mem_desc with one tgt-array for all splay tree nodes registered from one image? Perhaps you would just use tgt_start of 0 and tgt_end of 0 too (to make it clear it is special) and just use tgt_offset relative to that (i.e. absolute), but having to malloc each node individually and having to malloc a target_mem_desc for each one sounds expensive. Everything is freed just once anyway, isn't it? @@ -654,6 +727,18 @@ void GOMP_offload_register (void *host_table, enum offload_target_type target_type, void *target_data) { + int i; + gomp_mutex_lock (register_lock); + + /* Load image to all initialized devices. */ + for (i = 0; i num_devices; i++) +{ + struct gomp_device_descr *devicep = devices[i]; + if (devicep-type == target_type devicep-is_initialized) + gomp_offload_image_to_device (devicep, host_table, target_data); Shouldn't either this function, or gomp_offload_image_to_device lock also devicep-lock mutex and unlock at the end? Where exactly I guess depends on if the devicep-* hook calls should be guarded with the mutex or not. If yes, it should be this function and gomp_init_device. + if (devicep-type != target_type || !devicep-is_initialized) + continue; + Similarly. + devicep-unload_image_func (devicep-target_id, target_data); + + /* Remove mapping from splay tree. */ + for (j = 0; j num_funcs; j++) + { + struct splay_tree_key_s k; + k.host_start = (uintptr_t) host_func_table[j]; + k.host_end = k.host_start + 1; + splay_tree_remove (devicep-mem_map, k); + } + + for (j = 0; j num_vars; j++) + { + struct splay_tree_key_s k; + k.host_start = (uintptr_t) host_var_table[j*2]; + k.host_end = k.host_start + (uintptr_t) host_var_table[j*2+1]; + splay_tree_remove (devicep-mem_map, k); + } +} Aren't you leaking here all the tgt-array and tgt allocations here? Though, if you change it to just two allocations (one tgt, one array), you'd need to free just once. Jakub
Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
On Mon, 30 Mar 2015 18:42:02 +0200 Jakub Jelinek ja...@redhat.com wrote: On Thu, Mar 26, 2015 at 11:41:30PM +0300, Ilya Verbin wrote: Here is the latest patch for libgomp and mic plugin. make check-target-libgomp using intelmic emul passed. Also I used a testcase from the attachment. This applies cleanly. Latest ptx part is here, I guess: https://gcc.gnu.org/ml/gcc-patches/2015-02/msg01407.html But the one Julian posted doesn't apply on top of your patch. If there is any interdiff needed on top of your patch, can it be posted against trunk + your patch? Here's a version of my patch against trunk and Ilya's latest patch (hopefully!). Tests look OK (libgomp + PTX). HTH, Juliancommit f203634ace786b5bb2fdce56f123f3fba236dda3 Author: Julian Brown jul...@codesourcery.com Date: Mon Mar 30 14:37:53 2015 -0700 nvptx load/unload support, init rework diff --git a/gcc/config/nvptx/mkoffload.c b/gcc/config/nvptx/mkoffload.c index 02c44b6..dbc68bc 100644 --- a/gcc/config/nvptx/mkoffload.c +++ b/gcc/config/nvptx/mkoffload.c @@ -839,6 +839,7 @@ process (FILE *in, FILE *out) { const char *input = read_file (in); Token *tok = tokenize (input); + unsigned int nvars = 0, nfuncs = 0; do tok = parse_file (tok); @@ -850,16 +851,17 @@ process (FILE *in, FILE *out) write_stmts (out, rev_stmts (fns)); fprintf (out, ;\n\n); fprintf (out, static const char *var_mappings[] = {\n); - for (id_map *id = var_ids; id; id = id-next) + for (id_map *id = var_ids; id; id = id-next, nvars++) fprintf (out, \t\%s\%s\n, id-ptx_name, id-next ? , : ); fprintf (out, };\n\n); fprintf (out, static const char *func_mappings[] = {\n); - for (id_map *id = func_ids; id; id = id-next) + for (id_map *id = func_ids; id; id = id-next, nfuncs++) fprintf (out, \t\%s\%s\n, id-ptx_name, id-next ? , : ); fprintf (out, };\n\n); fprintf (out, static const void *target_data[] = {\n); - fprintf (out, ptx_code, var_mappings, func_mappings\n); + fprintf (out, ptx_code, (void*) %u, var_mappings, (void*) %u, + func_mappings\n, nvars, nfuncs); fprintf (out, };\n\n); fprintf (out, extern void GOMP_offload_register (const void *, int, void *);\n); diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index a1d42c5..5272f01 100644 --- a/libgomp/libgomp.h +++ b/libgomp/libgomp.h @@ -655,9 +655,6 @@ struct target_mem_desc { /* Corresponding target device descriptor. */ struct gomp_device_descr *device_descr; - /* Memory mapping info for the thread that created this descriptor. */ - struct splay_tree_s *mem_map; - /* List of splay keys to remove (or decrease refcount) at the end of region. */ splay_tree_key list[]; @@ -691,18 +688,6 @@ typedef struct acc_dispatch_t /* This is guarded by the lock in the outer struct gomp_device_descr. */ struct target_mem_desc *data_environ; - /* Extra information required for a device instance by a given target. */ - /* This is guarded by the lock in the outer struct gomp_device_descr. */ - void *target_data; - - /* Open or close a device instance. */ - void *(*open_device_func) (int n); - int (*close_device_func) (void *h); - - /* Set or get the device number. */ - int (*get_device_num_func) (void); - void (*set_device_num_func) (int); - /* Execute. */ void (*exec_func) (void (*) (void *), size_t, void **, void **, size_t *, unsigned short *, int, int, int, int, void *); @@ -720,7 +705,7 @@ typedef struct acc_dispatch_t void (*async_set_async_func) (int); /* Create/destroy TLS data. */ - void *(*create_thread_data_func) (void *); + void *(*create_thread_data_func) (int); void (*destroy_thread_data_func) (void *); /* NVIDIA target specific routines. */ diff --git a/libgomp/oacc-async.c b/libgomp/oacc-async.c index 08b7c5e..1f5827e 100644 --- a/libgomp/oacc-async.c +++ b/libgomp/oacc-async.c @@ -26,7 +26,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see http://www.gnu.org/licenses/. */ - +#include assert.h #include openacc.h #include libgomp.h #include oacc-int.h @@ -37,13 +37,23 @@ acc_async_test (int async) if (async acc_async_sync) gomp_fatal (invalid async argument: %d, async); - return base_dev-openacc.async_test_func (async); + struct goacc_thread *thr = goacc_thread (); + + if (!thr || !thr-dev) +gomp_fatal (no device active); + + return thr-dev-openacc.async_test_func (async); } int acc_async_test_all (void) { - return base_dev-openacc.async_test_all_func (); + struct goacc_thread *thr = goacc_thread (); + + if (!thr || !thr-dev) +gomp_fatal (no device active); + + return thr-dev-openacc.async_test_all_func (); } void @@ -52,19 +62,34 @@ acc_wait (int async) if (async acc_async_sync) gomp_fatal (invalid async argument: %d, async); - base_dev-openacc.async_wait_func (async); + struct goacc_thread *thr = goacc_thread (); + + if (!thr || !thr-dev) +gomp_fatal (no device
Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
On Thu, 26 Mar 2015 13:09:19 +0100 Jakub Jelinek ja...@redhat.com wrote: On Mon, Mar 23, 2015 at 10:44:39PM +0300, Ilya Verbin wrote: If it is too late for such global changes (rework initialization in libgomp, change mic and ptx plugins), then here is a small workaround patch to fix offloading from libraries. Likely, it will not affect OpenACC programs with one image. make check-target-libgomp passed. Sorry for not getting to this earlier, really busy with severe regressions bugfixing lately. Anyway, IMHO it is not too late to fixing it properly, after all, the current code is majorly broken. As I've said earlier, e.g. the lack of mutex guarding gomp_target_init (which is using pthread_once guaranteed to be run just once) vs. concurrent GOMP_offload_register calls (if those are run from ctors, then I guess something like dl_load_lock ensures at least on glibc that multiple GOMP_offload_register calls aren't performed at the same time) in accessing/reallocating offload_images and num_offload_images and the lack of support to register further images after the gomp_target_init call (if you dlopen further shared libraries) is really bad. And it would be really nice to support the unloading. But I'm afraid I'm lost in what is the latest posted patch for that, and how has it been tested (whether just on MIC or MIC emul, or also for nvptx). So can you please post a link to the latest full patch and how it has been tested, and if it is still error prone if say one thread executes GOMP_target the first time and another at the same time dlopens some shared library that has offloading regions in it, fix that too? I couldn't say about that -- I don't have all the state on the locking problems at the moment. We still have a week or so to get this sorted out. Apologies again for the delay in getting this out. Here's a current version of the patch against the gomp4 branch (on top of Ilya's load/unload patch) which passes testing for nvptx/openacc/libgomp, modulo the usual (timing-related) noise in the lib-83.c test. Thomas, can we get this tested on mainline and with MIC emulation? This version fixes (some if not all) regressions with multiple NVidia devices, and removes the memory-map lock and is_initialised fields, reverting to just a splay tree in gomp_device_descr (as the code was before the OpenACC merge). (Our multi-GPU machine is temporarily out of action, so I can't easily test that setup at the moment). HTH, Julian ChangeLog gcc/ * config/nvptx/mkoffload.c (process): Support variable mapping. libgomp/ * libgomp.h (target_mem_desc: Remove mem_map field. (struct gomp_memory_mapping): Remove. (acc_dispatch_t): Remove open_device_func, close_device_func, get_device_num_func, set_device_num_func, target_data members. Change create_thread_data_func argument to device number instead of generic pointer. (struct gomp_device_descr): Replace mem_map field with splay tree directly. (gomp_free_memmap): Update prototype. * oacc-async.c (assert.h): Include. (acc_async_test, acc_async_test_all, acc_wait, acc_wait_async) (acc_wait_all, acc_wait_all_async): Use current host thread's active device, not base_dev. * oacc-cuda.c (acc_get_current_cuda_device) (acc_get_current_cuda_context, acc_get_cuda_stream) (acc_set_cuda_stream): Likewise. * oacc-host.c (host_dispatch): Don't set open_device_func, close_device_func, get_device_num_func or set_device_num_func. (goacc_host_init): Don't initialise host_dispatch.mem_map.lock. * oacc-init.c (base_dev, init_key): Remove. (cached_base_dev): New. (name_of_acc_device_t): New. (acc_init_1): Initialise default-numbered device, not zeroth. (acc_shutdown_1): Close all devices of a given type. (goacc_destroy_thread): Don't use base_dev. (lazy_open, lazy_init, lazy_init_and_open): Remove. (goacc_attach_host_thread_to_device): New. (acc_init): Reimplement with goacc_attach_host_thread_to_device. (acc_get_num_devices): Don't use base_dev. (acc_set_device_type): Reimplement. (acc_get_device_type): Don't use base_dev. (acc_get_device_num): Tweak logic. (acc_set_device_num): Likewise. (goacc_runtime_initialize): Initialize cached_base_dev not base_dev. (goacc_lazy_initialize): Reimplement with acc_init and goacc_attach_host_thread_to_device. * oacc-int.h (goacc_thread): Add base_dev field. (base_dev): Remove extern declaration. (goacc_attach_host_thread_to_device): Add prototype. * oacc-mem.c (lookup_host): Change first argument to gomp_device_descr. Use lock/splay tree from gomp_device_descr. (lookup_dev): Use lock from devicep not mem_map. (acc_malloc): Use current thread's device instead of base_dev. (acc_free): Likewise. (acc_memcpy_to_device): Likewise. (acc_memcpy_from_device): Likewise. (acc_deviceptr, acc_map_data, acc_unmap_data,
Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
On Mon, Mar 23, 2015 at 10:44:39PM +0300, Ilya Verbin wrote: If it is too late for such global changes (rework initialization in libgomp, change mic and ptx plugins), then here is a small workaround patch to fix offloading from libraries. Likely, it will not affect OpenACC programs with one image. make check-target-libgomp passed. Sorry for not getting to this earlier, really busy with severe regressions bugfixing lately. Anyway, IMHO it is not too late to fixing it properly, after all, the current code is majorly broken. As I've said earlier, e.g. the lack of mutex guarding gomp_target_init (which is using pthread_once guaranteed to be run just once) vs. concurrent GOMP_offload_register calls (if those are run from ctors, then I guess something like dl_load_lock ensures at least on glibc that multiple GOMP_offload_register calls aren't performed at the same time) in accessing/reallocating offload_images and num_offload_images and the lack of support to register further images after the gomp_target_init call (if you dlopen further shared libraries) is really bad. And it would be really nice to support the unloading. But I'm afraid I'm lost in what is the latest posted patch for that, and how has it been tested (whether just on MIC or MIC emul, or also for nvptx). So can you please post a link to the latest full patch and how it has been tested, and if it is still error prone if say one thread executes GOMP_target the first time and another at the same time dlopens some shared library that has offloading regions in it, fix that too? We still have a week or so to get this sorted out. Jakub
Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
On Thu, Mar 26, 2015 at 13:09:19 +0100, Jakub Jelinek wrote: On Mon, Mar 23, 2015 at 10:44:39PM +0300, Ilya Verbin wrote: If it is too late for such global changes (rework initialization in libgomp, change mic and ptx plugins), then here is a small workaround patch to fix offloading from libraries. Likely, it will not affect OpenACC programs with one image. make check-target-libgomp passed. Sorry for not getting to this earlier, really busy with severe regressions bugfixing lately. Anyway, IMHO it is not too late to fixing it properly, after all, the current code is majorly broken. As I've said earlier, e.g. the lack of mutex guarding gomp_target_init (which is using pthread_once guaranteed to be run just once) vs. concurrent GOMP_offload_register calls (if those are run from ctors, then I guess something like dl_load_lock ensures at least on glibc that multiple GOMP_offload_register calls aren't performed at the same time) in accessing/reallocating offload_images and num_offload_images and the lack of support to register further images after the gomp_target_init call (if you dlopen further shared libraries) is really bad. And it would be really nice to support the unloading. But I'm afraid I'm lost in what is the latest posted patch for that, and how has it been tested (whether just on MIC or MIC emul, or also for nvptx). So can you please post a link to the latest full patch and how it has been tested, and if it is still error prone if say one thread executes GOMP_target the first time and another at the same time dlopens some shared library that has offloading regions in it, fix that too? We still have a week or so to get this sorted out. Here is the latest patch for libgomp and mic plugin. make check-target-libgomp using intelmic emul passed. Also I used a testcase from the attachment. Latest ptx part is here, I guess: https://gcc.gnu.org/ml/gcc-patches/2015-02/msg01407.html Thomas, could you please test these 2 patches together on nvptx? gcc/ * config/i386/intelmic-mkoffload.c (generate_host_descr_file): Call GOMP_offload_unregister from the destructor. libgomp/ * libgomp-plugin.h (struct mapping_table): Replace with addr_pair. * libgomp.h (struct gomp_memory_mapping): Remove. (struct target_mem_desc): Change type of mem_map from gomp_memory_mapping * to splay_tree_s *. (struct gomp_device_descr): Remove register_image_func, get_table_func. Add load_image_func, unload_image_func. Change type of mem_map from gomp_memory_mapping to splay_tree_s. Remove offload_regions_registered. (gomp_init_tables): Remove. (gomp_free_memmap): Change type of argument from gomp_memory_mapping * to splay_tree_s *. * libgomp.map (GOMP_4.0.1): Add GOMP_offload_unregister. * oacc-host.c (host_dispatch): Do not initialize register_image_func, get_table_func, mem_map.is_initialized, mem_map.splay_tree.root, offload_regions_registered. Initialize load_image_func, unload_image_func, mem_map.root. (goacc_host_init): Do not initialize host_dispatch.mem_map.lock. * oacc-init.c (lazy_open): Don't call gomp_init_tables. (acc_shutdown_1): Use dev's lock and splay_tree instead of mem_map's. * oacc-mem.c (lookup_host): Get gomp_device_descr *dev instead of gomp_memory_mapping *. Use dev's lock and splay_tree. (lookup_dev): Use dev's lock. (acc_deviceptr): Pass dev to lookup_host instead of mem_map. (acc_is_present): Likewise. (acc_map_data): Likewise. (acc_unmap_data): Likewise. Use dev's lock. (present_create_copy): Likewise. (delete_copyout): Pass dev to lookup_host instead of mem_map. (update_dev_host): Likewise. (gomp_acc_remove_pointer): Likewise. Use dev's lock. * oacc-parallel.c (GOACC_parallel): Use dev's lock and splay_tree. * plugin/plugin-host.c (GOMP_OFFLOAD_register_image): Remove. (GOMP_OFFLOAD_get_table): Remove (GOMP_OFFLOAD_load_image): New function. (GOMP_OFFLOAD_unload_image): New function. * target.c (register_lock): New mutex for offload image registration. (gomp_map_vars): Use dev's lock and splay_tree instead of mem_map's. (gomp_copy_from_async): Likewise. (gomp_unmap_vars): Likewise. (gomp_update): Remove gomp_memory_mapping argument. Use dev's lock and splay_tree instead of mm's. (gomp_splay_tree_insert_mapping): New static function. (gomp_offload_image_to_device): Ditto. (GOMP_offload_register): Add mutex lock. Call gomp_offload_image_to_device for all initialized devices. (GOMP_offload_unregister): New function. (gomp_init_tables): Replace with gomp_init_device. Replace a call to get_table_func from the plugin with calls to init_device_func and
Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
On Mon, Mar 09, 2015 at 14:45:55 +, Julian Brown wrote: On Fri, 6 Mar 2015 17:01:13 +0300 Ilya Verbin iver...@gmail.com wrote: On Thu, Feb 26, 2015 at 20:25:11 +0300, Ilya Verbin wrote: On Wed, Feb 25, 2015 at 10:36:08 +0100, Thomas Schwinge wrote: Julian Brown jul...@codesourcery.com wrote: This is a version of the previously-posted patch to rework initialisation and support the proposed load/unload hooks, merged to gomp4 branch and tested alongside the two patches (from Currently the 'struct gomp_memory_mapping' contains 'lock' and 'is_initialized'. Do you still need them? Or we can use gomp_device_descr::lock and is_initialized instead? If yes, then we can replace the gomp_memory_mapping structure with a splay_tree, as it was before the OpenACC merge. Ping? Apologies, I've been distracted with travel and other things. I suspect, as you suggest, that the gomp_memory_mapping lock/is_initialized fields may no longer be required. I haven't yet had time to address that nor all of Thomas's comments on the patch (mostly breakage with multiple devices), and I'm unlikely to have time this week either due to vacation... If it is too late for such global changes (rework initialization in libgomp, change mic and ptx plugins), then here is a small workaround patch to fix offloading from libraries. Likely, it will not affect OpenACC programs with one image. make check-target-libgomp passed. PR libgomp/65338 libgomp/ * libgomp.h (struct gomp_device_descr): Remove offload_regions_registered. * oacc-host.c (host_dispatch): Do not initialize offload_regions_registered. * target.c (gomp_register_image_for_device): Do not check for offload_regions_registered. (gomp_target_init): Do not initialize offload_regions_registered. diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index 3089401..f45fdba 100644 --- a/libgomp/libgomp.h +++ b/libgomp/libgomp.h @@ -793,9 +793,6 @@ struct gomp_device_descr /* Set to true when device is initialized. */ bool is_initialized; - /* True when offload regions have been registered with this device. */ - bool offload_regions_registered; - /* OpenACC-specific data and functions. */ /* This is mutable because of its mutable data_environ and target_data members. */ diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c index 6aeb1e7..2763f44 100644 --- a/libgomp/oacc-host.c +++ b/libgomp/oacc-host.c @@ -56,7 +56,6 @@ static struct gomp_device_descr host_dispatch = .mem_map.is_initialized = false, .mem_map.splay_tree.root = NULL, .is_initialized = false, -.offload_regions_registered = false, .openacc = { .open_device_func = GOMP_OFFLOAD_openacc_open_device, diff --git a/libgomp/target.c b/libgomp/target.c index 50baa4d..db1f509 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -1035,13 +1035,8 @@ static void gomp_register_image_for_device (struct gomp_device_descr *device, struct offload_image_descr *image) { - if (!device-offload_regions_registered - (device-type == image-type - || device-type == OFFLOAD_TARGET_TYPE_HOST)) -{ - device-register_image_func (image-host_table, image-target_data); - device-offload_regions_registered = true; -} + if (device-type == image-type || device-type == OFFLOAD_TARGET_TYPE_HOST) +device-register_image_func (image-host_table, image-target_data); } /* This function initializes the runtime needed for offloading. @@ -1105,7 +1100,6 @@ gomp_target_init (void) current_device.mem_map.is_initialized = false; current_device.mem_map.splay_tree.root = NULL; current_device.is_initialized = false; - current_device.offload_regions_registered = false; current_device.openacc.data_environ = NULL; current_device.openacc.target_data = NULL; for (i = 0; i new_num_devices; i++) -- Ilya
Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
On Fri, 6 Mar 2015 17:01:13 +0300 Ilya Verbin iver...@gmail.com wrote: On Thu, Feb 26, 2015 at 20:25:11 +0300, Ilya Verbin wrote: On Wed, Feb 25, 2015 at 10:36:08 +0100, Thomas Schwinge wrote: Julian Brown jul...@codesourcery.com wrote: This is a version of the previously-posted patch to rework initialisation and support the proposed load/unload hooks, merged to gomp4 branch and tested alongside the two patches (from Currently the 'struct gomp_memory_mapping' contains 'lock' and 'is_initialized'. Do you still need them? Or we can use gomp_device_descr::lock and is_initialized instead? If yes, then we can replace the gomp_memory_mapping structure with a splay_tree, as it was before the OpenACC merge. Ping? Apologies, I've been distracted with travel and other things. I suspect, as you suggest, that the gomp_memory_mapping lock/is_initialized fields may no longer be required. I haven't yet had time to address that nor all of Thomas's comments on the patch (mostly breakage with multiple devices), and I'm unlikely to have time this week either due to vacation... Thanks, Julian
Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
On Thu, Feb 26, 2015 at 20:25:11 +0300, Ilya Verbin wrote: On Wed, Feb 25, 2015 at 10:36:08 +0100, Thomas Schwinge wrote: Julian Brown jul...@codesourcery.com wrote: This is a version of the previously-posted patch to rework initialisation and support the proposed load/unload hooks, merged to gomp4 branch and tested alongside the two patches (from Currently the 'struct gomp_memory_mapping' contains 'lock' and 'is_initialized'. Do you still need them? Or we can use gomp_device_descr::lock and is_initialized instead? If yes, then we can replace the gomp_memory_mapping structure with a splay_tree, as it was before the OpenACC merge. Ping? Thanks, -- Ilya
Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
Hi, On Wed, Feb 25, 2015 at 10:36:08 +0100, Thomas Schwinge wrote: Julian Brown jul...@codesourcery.com wrote: This is a version of the previously-posted patch to rework initialisation and support the proposed load/unload hooks, merged to gomp4 branch and tested alongside the two patches (from Currently the 'struct gomp_memory_mapping' contains 'lock' and 'is_initialized'. Do you still need them? Or we can use gomp_device_descr::lock and is_initialized instead? If yes, then we can replace the gomp_memory_mapping structure with a splay_tree, as it was before the OpenACC merge. Thanks, -- Ilya
Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
On Wed, 25 Feb 2015 10:36:08 +0100 Thomas Schwinge tho...@codesourcery.com wrote: Hi! On Tue, 24 Feb 2015 11:29:51 +, Julian Brown jul...@codesourcery.com wrote: Test results look OK, barring a suspected harness issue (lib-83 failing with a timeout for nvptx However, I'm seeing a class of testsuite regressions: all variants of libgomp.oacc-fortran/lib-5.f90 and libgomp.oacc-fortran/lib-7.f90 FAIL: »libgomp: cuMemFreeHost error: invalid value«. I see these two test cases contain a lot of acc_get_num_devices and similar calls -- I've been testing this on our nvidiak20-2 system, which contains two Nvidia K20 cards, so maybe there's something wrong in that regard. (But why is this failing only for Fortran -- are we missing C/C++ tests in that area?) Can you have a look, or want me to? I can have a look at that. --- a/gcc/config/nvptx/mkoffload.c +++ b/gcc/config/nvptx/mkoffload.c @@ -850,16 +851,17 @@ process (FILE *in, FILE *out) fprintf (out, static const void *target_data[] = {\n); - fprintf (out, ptx_code, var_mappings, func_mappings\n); + fprintf (out, ptx_code, (void*) %u, var_mappings, (void*) %u, + func_mappings\n, nvars, nfuncs); fprintf (out, };\n\n); I wondered if it's maybe more elegant to just separate those by NULL delimiters instead of the size integers casted to void * (spaces missing)? But then, that'd need double scanning in the consumer, libgomp/plugin/plugin-nvptx.c:GOMP_OFFLOAD_load_image, because we need to allocate an appropriately sized array, so maybe your more expressive approach is better indeed. Yeah, I considered both: there's probably not much to choose between the approaches. They use the same amount of space. --- a/libgomp/oacc-async.c +++ b/libgomp/oacc-async.c @@ -34,44 +34,68 @@ int acc_async_test (int async) { + struct goacc_thread *thr = goacc_thread (); + if (async acc_async_sync) gomp_fatal (invalid async argument: %d, async); - return base_dev-openacc.async_test_func (async); + assert (thr-dev); + + return thr-dev-openacc.async_test_func (async); } Here, and in several other places: is this code conforming to the OpenACC specification? Do we need to (lazily) initialize in all these places, or in goacc_thread, or gracefully fail (see below) if not initialized (basically in all places where you currently assert (thr-dev)? #include openacc.h int main(int argc, char *argv[]) { return acc_async_test(0); } [sigsegv] Whether it conforms to the spec or not is a hard question to answer, because a lot of behaviour is left undefined. But here are two possibly-useful made-up guidelines: 1. Does the program work the same with OpenACC disabled? 2. Does some strange use of OpenACC functionality (including library calls, etc.) probably indicate user error? Much of the lazy initialisation code is there so that (1) can be true -- i.e., a program can use OpenACC directives without making an explicit call to acc_init or other API-specific initialisation code. But this case is an explicit call to the OpenACC runtime library, so the program can't work without -fopenacc enabled, so we can follow guideline (2) instead. And in this case, it's meaningless to test for completion of async operation when no device is active. Of course though, this should be an actual error rather than a crash. But, I don't think we want to lazily-initialise here. Also, I'm not sure what the expected outcome of this code sequence is: acc_init(acc_device_nvidia); acc_shutdown(acc_device_nvidia); acc_async_test(0); a.out: [...]/source-gcc/libgomp/oacc-async.c:42: acc_async_test: Assertion `thr-dev' failed. Aborted (core dumped) If the OpenACC specification can be read such that all this indeed is undefined behavior, then aborting/crashing is OK, of course. Again, this would probably indicate user error in a real program, so it should raise a (real) error message. --- a/libgomp/oacc-cuda.c +++ b/libgomp/oacc-cuda.c @@ -34,51 +34,53 @@ void * acc_get_current_cuda_device (void) { - void *p = NULL; + struct goacc_thread *thr = goacc_thread (); - if (base_dev base_dev-openacc.cuda.get_current_device_func) -p = base_dev-openacc.cuda.get_current_device_func (); + if (thr thr-dev thr-dev-openacc.cuda.get_current_device_func) +return thr-dev-openacc.cuda.get_current_device_func (); - return p; + return NULL; } Here, and in other places, it looks as if we'd fail gracefully. Not sure about this (maybe it should be an error too?), but... int acc_set_cuda_stream (int async, void *stream) { - int s = -1; + struct goacc_thread *thr; if (async 0 || stream == NULL) return 0; goacc_lazy_initialize (); - if (base_dev base_dev-openacc.cuda.set_stream_func) -s = base_dev-openacc.cuda.set_stream_func
Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
On Wed, Feb 25, 2015 at 10:36:08 +0100, Thomas Schwinge wrote: Julian Brown jul...@codesourcery.com wrote: OK for gomp4 branch? I could commit Ilya's patch there too if so. I'll leave the decision to Jakub, but, what about trunk? As Ilya indicated in http://news.gmane.org/find-root.php?message_id=%3C20150116231632.GB48380%40msticlxl57.ims.intel.com%3E, (at least part of) these patches are fixing a regression with offloading From shared libraries. (And maybe the rest qualifies as fixes and extensions to new code (offloading), so no danger to cause any regressions compared to the last GCC release?) BTW, when I removed calls to gomp_init_tables in https://gcc.gnu.org/ml/gcc-patches/2015-01/msg02275.html, I could accidentally remove some necessary gomp_mutex_lock/unlock. Also GOMP_offload_[un]register require some mutexes, as noted by Jakub. I'm going to fix this. So, I think we should commit all dependent patches to gomp4 branch, and I will post a fix for mutexes on top of them. Thanks, -- Ilya
libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)
Hi! On Tue, 24 Feb 2015 11:29:51 +, Julian Brown jul...@codesourcery.com wrote: On Wed, 4 Feb 2015 15:05:45 + Julian Brown jul...@codesourcery.com wrote: The major changes are: [...] Thanks for looking into this! This is a version of the previously-posted patch to rework initialisation and support the proposed load/unload hooks, merged to gomp4 branch and tested alongside the two patches (from https://gcc.gnu.org/wiki/Offloading#nvptx_Offloading): http://news.gmane.org/find-root.php?message_id=%3C20150218100035.GF1746%40tucnak.redhat.com%3E http://news.gmane.org/find-root.php?message_id=%3C546CF508.9010807%40codesourcery.com%3E As well as Ilya Verbin's patch: https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01605.html (I also added http://news.gmane.org/find-root.php?message_id=%3C20141115000346.GF40445%40msticlxl57.ims.intel.com%3E to the mix.) Test results look OK, barring a suspected harness issue (lib-83 failing with a timeout for nvptx Yes; Jim's rewriting the timing code. However, I'm seeing a class of testsuite regressions: all variants of libgomp.oacc-fortran/lib-5.f90 and libgomp.oacc-fortran/lib-7.f90 FAIL: »libgomp: cuMemFreeHost error: invalid value«. I see these two test cases contain a lot of acc_get_num_devices and similar calls -- I've been testing this on our nvidiak20-2 system, which contains two Nvidia K20 cards, so maybe there's something wrong in that regard. (But why is this failing only for Fortran -- are we missing C/C++ tests in that area?) Can you have a look, or want me to? OK for gomp4 branch? I could commit Ilya's patch there too if so. I'll leave the decision to Jakub, but, what about trunk? As Ilya indicated in http://news.gmane.org/find-root.php?message_id=%3C20150116231632.GB48380%40msticlxl57.ims.intel.com%3E, (at least part of) these patches are fixing a regression with offloading From shared libraries. (And maybe the rest qualifies as fixes and extensions to new code (offloading), so no danger to cause any regressions compared to the last GCC release?) I have not reviewed all your changes; just a few comments: --- a/gcc/config/nvptx/mkoffload.c +++ b/gcc/config/nvptx/mkoffload.c @@ -850,16 +851,17 @@ process (FILE *in, FILE *out) fprintf (out, static const void *target_data[] = {\n); - fprintf (out, ptx_code, var_mappings, func_mappings\n); + fprintf (out, ptx_code, (void*) %u, var_mappings, (void*) %u, + func_mappings\n, nvars, nfuncs); fprintf (out, };\n\n); I wondered if it's maybe more elegant to just separate those by NULL delimiters instead of the size integers casted to void * (spaces missing)? But then, that'd need double scanning in the consumer, libgomp/plugin/plugin-nvptx.c:GOMP_OFFLOAD_load_image, because we need to allocate an appropriately sized array, so maybe your more expressive approach is better indeed. --- a/libgomp/oacc-async.c +++ b/libgomp/oacc-async.c @@ -34,44 +34,68 @@ int acc_async_test (int async) { + struct goacc_thread *thr = goacc_thread (); + if (async acc_async_sync) gomp_fatal (invalid async argument: %d, async); - return base_dev-openacc.async_test_func (async); + assert (thr-dev); + + return thr-dev-openacc.async_test_func (async); } (Here, and in several other places: I would have placed the declaration of thr and its initialization just before its first use, but then, no need to change that now.) Here, and in several other places: is this code conforming to the OpenACC specification? Do we need to (lazily) initialize in all these places, or in goacc_thread, or gracefully fail (see below) if not initialized (basically in all places where you currently assert (thr-dev)? #include openacc.h int main(int argc, char *argv[]) { return acc_async_test(0); } $ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/ -Bbuild-gcc/x86_64-unknown-linux-gnu/./libgomp/ -Bbuild-gcc/x86_64-unknown-linux-gnu/./libgomp/.libs -Ibuild-gcc/x86_64-unknown-linux-gnu/./libgomp -Isource-gcc/libgomp -Binstall/offload-nvptx-none/libexec/gcc/x86_64-unknown-linux-gnu/5.0.0 -Binstall/offload-nvptx-none/bin -Binstall/offload-x86_64-intelmicemul-linux-gnu/libexec/gcc/x86_64-unknown-linux-gnu/5.0.0 -Binstall/offload-x86_64-intelmicemul-linux-gnu/bin -Lbuild-gcc/x86_64-unknown-linux-gnu/./libgomp/.libs -Wl,-rpath,build-gcc/x86_64-unknown-linux-gnu/./libgomp/.libs -Wall ../a.c -fopenacc -g $ gdb -q a.out Reading symbols from a.out...done. (gdb) r Starting program: [...]/a.out [Thread debugging using libthread_db enabled] Using host libthread_db library /lib/x86_64-linux-gnu/libthread_db.so.1. Program received signal SIGSEGV, Segmentation fault. acc_async_test (async=0) at [...]/source-gcc/libgomp/oacc-async.c:42 42assert (thr-dev); Also, I'm not sure what the expected outcome of this code sequence is: acc_init(acc_device_nvidia); acc_shutdown(acc_device_nvidia);
Re: Merge current set of OpenACC changes from gomp-4_0-branch
Hi, On Wed, 4 Feb 2015 15:05:45 + Julian Brown jul...@codesourcery.com wrote: The major changes are: * The removal of the OpenACC-specific plugin hooks open_device, close_device, set_device_num and get_device_num. The functionality has been moved into the init/fini hooks (for the first two) or moved into the target-independent OpenACC parts, respectively. * The PTX mkoffload utility has been extended to support variables as well as function mapping, to fill out support for the load/unload image hooks. (Not really tested so far!) * The plugin hooks that are shared between OpenMP and OpenACC now support the device number argument properly: that should help with (eventually) unifying the plugin interface for the two APIs. (With set_device_num and get_device_num removed, the plugin is stateless with respect to which device is currently active. The rest of the OpenACC hooks -- async functions, etc. -- should probably be changed to take a device number argument too, but that could be a follow-on patch.) * The limitation of having only one type of device active simultaneously in the OpenACC runtime has (theoretically!) been removed. This is a version of the previously-posted patch to rework initialisation and support the proposed load/unload hooks, merged to gomp4 branch and tested alongside the two patches (from https://gcc.gnu.org/wiki/Offloading#nvptx_Offloading): http://news.gmane.org/find-root.php?message_id=%3C20150218100035.GF1746%40tucnak.redhat.com%3E http://news.gmane.org/find-root.php?message_id=%3C546CF508.9010807%40codesourcery.com%3E As well as Ilya Verbin's patch: https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01605.html Test results look OK, barring a suspected harness issue (lib-83 failing with a timeout for nvptx, though it works fine from the command line). OK for gomp4 branch? I could commit Ilya's patch there too if so. Thanks, Julian ChangeLog gcc/ * config/nvptx/mkoffload.c (process): Support variable mapping. libgomp/ * libgomp.h (acc_dispatch_t): Remove open_device_func, close_device_func, get_device_num_func, set_device_num_func, target_data members. Change create_thread_data_func argument to device number instead of generic pointer. * oacc-async.c (assert.h): Include. (acc_async_test, acc_async_test_all, acc_wait, acc_wait_async) (acc_wait_all, acc_wait_all_async): Use current host thread's active device, not base_dev. * oacc-cuda.c (acc_get_current_cuda_device) (acc_get_current_cuda_context, acc_get_cuda_stream) (acc_set_cuda_stream): Likewise. * oacc-host.c (host_dispatch): Don't set open_device_func, close_device_func, get_device_num_func or set_device_num_func. * oacc-init.c (base_dev, init_key): Remove. (cached_base_dev): New. (name_of_acc_device_t): New. (acc_init_1): Initialise default-numbered device, not zeroth. (acc_shutdown_1): Close all devices of a given type. (goacc_destroy_thread): Don't use base_dev. (lazy_open, lazy_init, lazy_init_and_open): Remove. (goacc_attach_host_thread_to_device): New. (acc_init): Reimplement with goacc_attach_host_thread_to_device. (acc_get_num_devices): Don't use base_dev. (acc_set_device_type): Reimplement. (acc_get_device_type): Don't use base_dev. (acc_get_device_num): Tweak logic. (acc_set_device_num): Likewise. (goacc_runtime_initialize): Initialize cached_base_dev not base_dev. (goacc_lazy_initialize): Reimplement with acc_init and goacc_attach_host_thread_to_device. * oacc-int.h (goacc_thread): Add base_dev field. (base_dev): Remove extern declaration. (goacc_attach_host_thread_to_device): Add prototype. * oacc-mem.c (acc_malloc): Use current thread's device instead of base_dev. (acc_free): Likewise. (acc_memcpy_to_device): Likewise. (acc_memcpy_from_device): Likewise. * oacc-parallel.c (select_acc_device): Remove. Replace calls with goacc_lazy_initialize (throughout). * target.c (gomp_load_plugin_for_device): Don't initialise openacc open_device, close_device, get_device_num or set_device_num hooks. Don't initialise target_data. * plugin/plugin-host.c (GOMP_OFFLOAD_openacc_open_device) (GOMP_OFFLOAD_openacc_close_device) (GOMP_OFFLOAD_openacc_get_device_num) (GOMP_OFFLOAD_openacc_set_device_num): Remove. (GOMP_OFFLOAD_openacc_create_thread_data): Change (unused) argument to int. * plugin/plugin-nvptx.c (pthread.h): Include. (ptx_inited): Remove. (instantiated_devices, ptx_dev_lock): New. (struct ptx_image_data): New. (ptx_devices, ptx_images, ptx_image_lock): New. (nvptx_get_num_devices): Remove forward declaration. (nvptx_init): Change return type to bool. (nvptx_fini): Remove. (nvptx_attach_host_thread_to_device): New. (nvptx_open_device): Remove struct ptx_device* instead of void*. (nvptx_close_device): Change argument
Fix number of arguments parameter in Ada DEF_FUNCTION_TYPE_* (was: Merge current set of OpenACC changes from gomp-4_0-branch)
Hi! On Thu, 15 Jan 2015 21:20:07 +0100, I wrote: In r219682, I have committed to trunk our current set of OpenACC changes, which we had prepared on gomp-4_0-branch. [...] gcc/ada/ * gcc-interface/utils.c (DEF_FUNCTION_TYPE_VAR_8) (DEF_FUNCTION_TYPE_VAR_12): New macros. Committed in r220910: commit 11d2c7638e26ab69df7167474c9aa8f5d4114703 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Mon Feb 23 10:06:49 2015 + Fix number of arguments parameter in Ada DEF_FUNCTION_TYPE_*. gcc/ada/ * gcc-interface/utils.c (DEF_FUNCTION_TYPE_VAR_8) (DEF_FUNCTION_TYPE_VAR_12): Fix number of arguments parameter. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@220910 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ada/ChangeLog |5 + gcc/ada/gcc-interface/utils.c |4 ++-- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git gcc/ada/ChangeLog gcc/ada/ChangeLog index ddd6e10..06a51ac 100644 --- gcc/ada/ChangeLog +++ gcc/ada/ChangeLog @@ -1,3 +1,8 @@ +2015-02-23 Thomas Schwinge tho...@codesourcery.com + + * gcc-interface/utils.c (DEF_FUNCTION_TYPE_VAR_8) + (DEF_FUNCTION_TYPE_VAR_12): Fix number of arguments parameter. + 2015-02-22 Arnaud Charlet char...@adacore.com * doc/Makefile: postprocess texinfo files to update @dircategory diff --git gcc/ada/gcc-interface/utils.c gcc/ada/gcc-interface/utils.c index 44dad7b..d076da7 100644 --- gcc/ada/gcc-interface/utils.c +++ gcc/ada/gcc-interface/utils.c @@ -5477,11 +5477,11 @@ install_builtin_function_types (void) def_fn_type (ENUM, RETURN, 1, 5, ARG1, ARG2, ARG3, ARG4, ARG5); #define DEF_FUNCTION_TYPE_VAR_8(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \ ARG6, ARG7, ARG8) \ - def_fn_type (ENUM, RETURN, 1, 5, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6, \ + def_fn_type (ENUM, RETURN, 1, 8, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6, \ ARG7, ARG8); #define DEF_FUNCTION_TYPE_VAR_12(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \ ARG6, ARG7, ARG8, ARG9, ARG10, ARG11, ARG12) \ - def_fn_type (ENUM, RETURN, 1, 5, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6, \ + def_fn_type (ENUM, RETURN, 1, 12, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6, \ ARG7, ARG8, ARG9, ARG10, ARG11, ARG12); #define DEF_POINTER_TYPE(ENUM, TYPE) \ builtin_types[(int) ENUM] = build_pointer_type (builtin_types[(int) TYPE]); Grüße, Thomas signature.asc Description: PGP signature
Re: Merge current set of OpenACC changes from gomp-4_0-branch
On Wed, Feb 04, 2015 at 15:05:45 +, Julian Brown wrote: This (WIP) patch is based on top of a version of your patch that I merged to our internal branch: that's still the easiest way for me to test the PTX backend (with unloading support) at present, and it passes libgomp testing that way. Trunk should be fairly close, but I haven't tried applying it there yet. The major changes are: * The removal of the OpenACC-specific plugin hooks open_device, close_device, set_device_num and get_device_num. The functionality has been moved into the init/fini hooks (for the first two) or moved into the target-independent OpenACC parts, respectively. * The PTX mkoffload utility has been extended to support variables as well as function mapping, to fill out support for the load/unload image hooks. (Not really tested so far!) * The plugin hooks that are shared between OpenMP and OpenACC now support the device number argument properly: that should help with (eventually) unifying the plugin interface for the two APIs. (With set_device_num and get_device_num removed, the plugin is stateless with respect to which device is currently active. The rest of the OpenACC hooks -- async functions, etc. -- should probably be changed to take a device number argument too, but that could be a follow-on patch.) * The limitation of having only one type of device active simultaneously in the OpenACC runtime has (theoretically!) been removed. Thoughts? Up. I have no comments here since I'm not familiar with OpenACC and PTX, but I hope that Thomas and Jakub will review this and my corresponding patches [1], [2] before the final closure of the trunk. [1] https://gcc.gnu.org/ml/gcc-patches/2015-01/msg02275.html [2] https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01912.html -- Ilya
Re: Merge current set of OpenACC changes from gomp-4_0-branch
Hi! On Thu, 15 Jan 2015 21:20:07 +0100, I wrote: In r219682, I have committed to trunk our current set of OpenACC changes, which we had prepared on gomp-4_0-branch. This whole file is scheduled to go away: the routines are to be replaced by builtins which are expanded in the nvptx backend, but until we're there, here's a patch to make it at least work; committed to trunk in r220768: commit f816d7a63c8bc11c81080a0b34bf587d46b6f4c6 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Tue Feb 17 18:05:24 2015 + libgomp: Make nvptx helper routines self-contained. libgomp/ * oacc-ptx.h (GOACC_INTERNAL_PTX): Add GOACC_tid, GOACC_ntid, GOACC_ctaid, and GOACC_nctaid routines. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@220768 138bc75d-0d04-0410-961f-82ee72b054a4 --- libgomp/ChangeLog |6 ++ libgomp/oacc-ptx.h | 224 2 files changed, 230 insertions(+) diff --git libgomp/ChangeLog libgomp/ChangeLog index 6c24531..2c32d9e 100644 --- libgomp/ChangeLog +++ libgomp/ChangeLog @@ -1,3 +1,9 @@ +2015-02-17 Thomas Schwinge tho...@codesourcery.com + Cesar Philippidis ce...@codesourcery.com + + * oacc-ptx.h (GOACC_INTERNAL_PTX): Add GOACC_tid, GOACC_ntid, + GOACC_ctaid, and GOACC_nctaid routines. + 2015-02-11 Jakub Jelinek ja...@redhat.com PR c/64824 diff --git libgomp/oacc-ptx.h libgomp/oacc-ptx.h index 13ff86f..2419a46 100644 --- libgomp/oacc-ptx.h +++ libgomp/oacc-ptx.h @@ -101,9 +101,233 @@ .version 3.1\n \ .target sm_30\n \ .address_size 64\n \ + .visible .func (.param .u32 %out_retval) GOACC_tid (.param .u32 %in_ar1);\n \ + .visible .func (.param .u32 %out_retval) GOACC_ntid (.param .u32 %in_ar1);\n \ + .visible .func (.param .u32 %out_retval) GOACC_ctaid (.param .u32 %in_ar1);\n \ + .visible .func (.param .u32 %out_retval) GOACC_nctaid (.param .u32 %in_ar1);\n \ .visible .func (.param .u32 %out_retval) GOACC_get_num_threads;\n \ .visible .func (.param .u32 %out_retval) GOACC_get_thread_num;\n \ .extern .func abort;\n \ + .visible .func (.param .u32 %out_retval) GOACC_tid (.param .u32 %in_ar1)\n \ + {\n \ + .reg .u32 %ar1;\n \ + .reg .u32 %retval;\n \ + .reg .u64 %hr10;\n \ + .reg .u32 %r22;\n \ + .reg .u32 %r23;\n \ + .reg .u32 %r24;\n \ + .reg .u32 %r25;\n \ + .reg .u32 %r26;\n \ + .reg .u32 %r27;\n \ + .reg .u32 %r28;\n \ + .reg .u32 %r29;\n \ + .reg .pred %r30;\n \ + .reg .u32 %r31;\n \ + .reg .pred %r32;\n \ + .reg .u32 %r33;\n \ + .reg .pred %r34;\n \ + .local .align 8 .b8 %frame[4];\n \ + ld.param.u32 %ar1,[%in_ar1];\n \ + mov.u32 %r27,%ar1;\n \ + st.local.u32 [%frame],%r27;\n \ + ld.local.u32 %r28,[%frame];\n \ + mov.u32 %r29,1;\n \ + setp.eq.u32 %r30,%r28,%r29;\n \ + @%r30 bra $L4;\n \ + mov.u32 %r31,2;\n \ + setp.eq.u32 %r32,%r28,%r31;\n \ + @%r32 bra $L5;\n \ + mov.u32 %r33,0;\n \ + setp.eq.u32 %r34,%r28,%r33;\n \ + @!%r34 bra $L8;\n \ + mov.u32 %r23,%tid.x;\n \ + mov.u32 %r22,%r23;\n \ + bra $L7;\n \ + $L4:\n \ + mov.u32 %r24,%tid.y;\n \ + mov.u32 %r22,%r24;\n \ + bra $L7;\n \ + $L5:\n \ + mov.u32 %r25,%tid.z;\n \ + mov.u32 %r22,%r25;\n \ + bra $L7;\n \ + $L8:\n \ + {\n \ + {\n \ + call abort;\n \ + }\n \ + }\n \ + $L7:\n \ + mov.u32 %r26,%r22;\n \ + mov.u32 %retval,%r26;\n\ + st.param.u32 [%out_retval],%retval;\n \ + ret;\n
Re: Merge current set of OpenACC changes from gomp-4_0-branch
On Tue, 3 Feb 2015 23:01:04 +0300 Ilya Verbin iver...@gmail.com wrote: On 03 Feb 13:00, Julian Brown wrote: On Tue, 3 Feb 2015 14:28:44 +0300 Ilya Verbin iver...@gmail.com wrote: On 27 Jan 14:07, Julian Brown wrote: On Mon, 26 Jan 2015 17:34:26 +0300 Ilya Verbin iver...@gmail.com wrote: Here is my current patch, it works for OpenMP-MIC, but obviously will not work for PTX, since it requires symmetrical changes in the plugin. Could you please take a look, whether it is possible to support this new interface in PTX plugin? I think it can probably be made to work. I'll have a look in more detail. Do you have any progress on this? I'm still working on a patch to update OpenACC support and the PTX backend to use load/unload_image and to unify initialisation/opening. So far I think the answer is basically yes, the new interface can be supported, though I might request a minor tweak -- e.g. that load_image takes an extra void ** argument so that a libgomp backend can allocate a block of generic metadata relating to the image, then that same block would be passed (void *) to the unload hook so the backend can use it there and deallocate it when it's finished with. Would that be possible? (It'd mostly be for a CUmodule handle: this could be stashed away somewhere within the nvptx backend, but it might be neater to put it in generic code since it'll probably be useful for other backends anyway.) An extra argument is not a problem, however I don't quite get the idea. PTX plugin allocates some data while loading, and needs this data while unloading? Then why not to create a hash table with image_ptr - metadata mapping inside the plugin? [...] Right -- that's what I meant by could be stashed away somewhere within the nvptx backend. I just thought that retaining a generic chunk of state for each (JIT-compiled, in this case) block of code might be something that would be useful for other targets too. I've kept the required information (for now at least) within the nvptx backend as an associative list. This (WIP) patch is based on top of a version of your patch that I merged to our internal branch: that's still the easiest way for me to test the PTX backend (with unloading support) at present, and it passes libgomp testing that way. Trunk should be fairly close, but I haven't tried applying it there yet. The major changes are: * The removal of the OpenACC-specific plugin hooks open_device, close_device, set_device_num and get_device_num. The functionality has been moved into the init/fini hooks (for the first two) or moved into the target-independent OpenACC parts, respectively. * The PTX mkoffload utility has been extended to support variables as well as function mapping, to fill out support for the load/unload image hooks. (Not really tested so far!) * The plugin hooks that are shared between OpenMP and OpenACC now support the device number argument properly: that should help with (eventually) unifying the plugin interface for the two APIs. (With set_device_num and get_device_num removed, the plugin is stateless with respect to which device is currently active. The rest of the OpenACC hooks -- async functions, etc. -- should probably be changed to take a device number argument too, but that could be a follow-on patch.) * The limitation of having only one type of device active simultaneously in the OpenACC runtime has (theoretically!) been removed. Thoughts? Thanks, Julian ChangeLog gcc/ * config/nvptx/mkoffload.c (process): Support variable mapping. libgomp/ * libgomp.h (acc_dispatch_t): Remove open_device_func, close_device_func, get_device_num_func, set_device_num_func, target_data members. Change create_thread_data_func argument to device number instead of generic pointer. * oacc-async.c (assert.h): Include. (acc_async_test, acc_async_test_all, acc_wait, acc_wait_async) (acc_wait_all, acc_wait_all_async): Use current host thread's active device, not base_dev. * oacc-cuda.c (acc_get_current_cuda_device) (acc_get_current_cuda_context, acc_get_cuda_stream) (acc_set_cuda_stream): Likewise. * oacc-host.c (host_dispatch): Don't set open_device_func, close_device_func, get_device_num_func or set_device_num_func. * oacc-init.c (base_dev, init_key): Remove. (cached_base_dev): New. (name_of_acc_device_t): New. (acc_init_1): Initialise default-numbered device, not zeroth. (acc_shutdown_1): Close all devices of a given type. (goacc_destroy_thread): Don't use base_dev. (lazy_open, lazy_init, lazy_init_and_open): Remove. (goacc_attach_host_thread_to_device): New. (acc_init): Reimplement with goacc_attach_host_thread_to_device. (acc_get_num_devices): Don't use base_dev. (acc_set_device_type): Reimplement. (acc_get_device_type): Don't use base_dev. (acc_get_device_num): Tweak
Re: Merge current set of OpenACC changes from gomp-4_0-branch
Hi Julian! On 27 Jan 14:07, Julian Brown wrote: On Mon, 26 Jan 2015 17:34:26 +0300 Ilya Verbin iver...@gmail.com wrote: Here is my current patch, it works for OpenMP-MIC, but obviously will not work for PTX, since it requires symmetrical changes in the plugin. Could you please take a look, whether it is possible to support this new interface in PTX plugin? I think it can probably be made to work. I'll have a look in more detail. Do you have any progress on this? Thanks, -- Ilya
Re: Merge current set of OpenACC changes from gomp-4_0-branch
On Tue, 3 Feb 2015 14:28:44 +0300 Ilya Verbin iver...@gmail.com wrote: Hi Julian! On 27 Jan 14:07, Julian Brown wrote: On Mon, 26 Jan 2015 17:34:26 +0300 Ilya Verbin iver...@gmail.com wrote: Here is my current patch, it works for OpenMP-MIC, but obviously will not work for PTX, since it requires symmetrical changes in the plugin. Could you please take a look, whether it is possible to support this new interface in PTX plugin? I think it can probably be made to work. I'll have a look in more detail. Do you have any progress on this? I'm still working on a patch to update OpenACC support and the PTX backend to use load/unload_image and to unify initialisation/opening. So far I think the answer is basically yes, the new interface can be supported, though I might request a minor tweak -- e.g. that load_image takes an extra void ** argument so that a libgomp backend can allocate a block of generic metadata relating to the image, then that same block would be passed (void *) to the unload hook so the backend can use it there and deallocate it when it's finished with. Would that be possible? (It'd mostly be for a CUmodule handle: this could be stashed away somewhere within the nvptx backend, but it might be neater to put it in generic code since it'll probably be useful for other backends anyway.) Thanks, Julian
Re: Merge current set of OpenACC changes from gomp-4_0-branch
On 03 Feb 13:00, Julian Brown wrote: On Tue, 3 Feb 2015 14:28:44 +0300 Ilya Verbin iver...@gmail.com wrote: On 27 Jan 14:07, Julian Brown wrote: On Mon, 26 Jan 2015 17:34:26 +0300 Ilya Verbin iver...@gmail.com wrote: Here is my current patch, it works for OpenMP-MIC, but obviously will not work for PTX, since it requires symmetrical changes in the plugin. Could you please take a look, whether it is possible to support this new interface in PTX plugin? I think it can probably be made to work. I'll have a look in more detail. Do you have any progress on this? I'm still working on a patch to update OpenACC support and the PTX backend to use load/unload_image and to unify initialisation/opening. So far I think the answer is basically yes, the new interface can be supported, though I might request a minor tweak -- e.g. that load_image takes an extra void ** argument so that a libgomp backend can allocate a block of generic metadata relating to the image, then that same block would be passed (void *) to the unload hook so the backend can use it there and deallocate it when it's finished with. Would that be possible? (It'd mostly be for a CUmodule handle: this could be stashed away somewhere within the nvptx backend, but it might be neater to put it in generic code since it'll probably be useful for other backends anyway.) An extra argument is not a problem, however I don't quite get the idea. PTX plugin allocates some data while loading, and needs this data while unloading? Then why not to create a hash table with image_ptr - metadata mapping inside the plugin? In this case, to the unload hook can deallocate metadata using the image_ptr key. Since this metadata is target-specific, I believe it would be better to keep it inside the plugin. -- Ilya
Re: Merge current set of OpenACC changes from gomp-4_0-branch
On Mon, 26 Jan 2015 17:34:26 +0300 Ilya Verbin iver...@gmail.com wrote: Here is my current patch, it works for OpenMP-MIC, but obviously will not work for PTX, since it requires symmetrical changes in the plugin. Could you please take a look, whether it is possible to support this new interface in PTX plugin? I think it can probably be made to work. I'll have a look in more detail. Thanks, Julian
Re: Merge current set of OpenACC changes from gomp-4_0-branch
Thomas, Any plans to fix https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64635 soon? On x86_64 darwin, the OpenACC merge resulted a huge number of failures in the libgomp test suite… === libgomp Summary === # of expected passes 10628 # of unexpected failures 724 # of unsupported tests 562 which are resolved with a fix similar to https://gcc.gnu.org/bugzilla/attachment.cgi?id=34480. Jack On Mon, Jan 26, 2015 at 8:44 AM, Thomas Schwinge tho...@codesourcery.com wrote: Hi! Sorry for the late answer -- I've been on sick leave, and just now returning to work. Julian, would you please have a look at the following issues? In r219682, I have committed to trunk our current set of OpenACC changes, which we had prepared on gomp-4_0-branch. Thanks to everyone who has been contributing! On Fri, 23 Jan 2015 20:20:53 +0300, Ilya Verbin iver...@gmail.com wrote: On 17 Jan 02:16, Ilya Verbin wrote: Unfortunately, it broke offloading from shared libraries (I mean common libs with NEEDED entries, not dlopened). Sorry for that! Such things are not covered by the testsuite, that's why you missed this issue. Here is a simple testcase: http://news.gmane.org/find-root.php?message_id=%3C20150116231632.GB48380%40msticlxl57.ims.intel.com%3E Probably a good motivation for adding such a test case. ;-) So, you don't assume that a device can have multiple images from multiple libs? Ping? This probably is just a bug that we introduced with our changes? (Julian?) Also, could you please explain, why did you divide a device initialization into two functions -- gomp_init_device and gomp_init_tables? As I understand it (again, Julian, please correct me if I got that wrong), the reason is that for OpenACC support, we need these as two separate (independent) actions. Is this causing problems for OpenMP offloading? Currently I'm trying to rebase on trunk my old patch, which fixes offloading from dlopened libraries: http://gcc.gnu.org/ml/gcc-patches/2014-11/msg01604.html It works for OpenMP and MIC, but I don't know how not to break OpenACC and PTX. Grüße, Thomas
Re: Merge current set of OpenACC changes from gomp-4_0-branch
On Mon, 26 Jan 2015 14:44:19 +0100 Thomas Schwinge tho...@codesourcery.com wrote: On 17 Jan 02:16, Ilya Verbin wrote: Unfortunately, it broke offloading from shared libraries (I mean common libs with NEEDED entries, not dlopened). Sorry for that! Such things are not covered by the testsuite, that's why you missed this issue. Here is a simple testcase: http://news.gmane.org/find-root.php?message_id=%3C20150116231632.GB48380%40msticlxl57.ims.intel.com%3E Probably a good motivation for adding such a test case. ;-) So, you don't assume that a device can have multiple images from multiple libs? Ping? This probably is just a bug that we introduced with our changes? (Julian?) AFAICR, we haven't yet figured out how to make (shared) libraries work with PTX. Actually I'm not entirely sure if static libraries containing PTX code will work either. But, multiple images (e.g. from different object files) are supported, via the loop in gomp_target_init. (The semantics of gomp_register_image_for_device were changed, but not -- intentionally! -- to limit the number of offloaded images to one.) Also, could you please explain, why did you divide a device initialization into two functions -- gomp_init_device and gomp_init_tables? As I understand it (again, Julian, please correct me if I got that wrong), the reason is that for OpenACC support, we need these as two separate (independent) actions. Is this causing problems for OpenMP offloading? This was certainly necessary at some point, when the support for multiple devices of the same type in the OpenACC runtime was delegated entirely to target-dependent code. Later (after one round of refactoring), the gomp_device_descr and the memory map were still separate, with the former possibly representing a number of devices, and the latter having independent copies for each instance of a device. That's largely been refactored (again) away now though -- a gomp_device_descr and its memory map are stored together, per-device instance. So this separation of their initialisation can probably go away, although some (somewhat delicate) code in oacc-init.c would need to be tweaked. Julian
Re: Merge current set of OpenACC changes from gomp-4_0-branch
Hi! Sorry for the late answer -- I've been on sick leave, and just now returning to work. Julian, would you please have a look at the following issues? In r219682, I have committed to trunk our current set of OpenACC changes, which we had prepared on gomp-4_0-branch. Thanks to everyone who has been contributing! On Fri, 23 Jan 2015 20:20:53 +0300, Ilya Verbin iver...@gmail.com wrote: On 17 Jan 02:16, Ilya Verbin wrote: Unfortunately, it broke offloading from shared libraries (I mean common libs with NEEDED entries, not dlopened). Sorry for that! Such things are not covered by the testsuite, that's why you missed this issue. Here is a simple testcase: http://news.gmane.org/find-root.php?message_id=%3C20150116231632.GB48380%40msticlxl57.ims.intel.com%3E Probably a good motivation for adding such a test case. ;-) So, you don't assume that a device can have multiple images from multiple libs? Ping? This probably is just a bug that we introduced with our changes? (Julian?) Also, could you please explain, why did you divide a device initialization into two functions -- gomp_init_device and gomp_init_tables? As I understand it (again, Julian, please correct me if I got that wrong), the reason is that for OpenACC support, we need these as two separate (independent) actions. Is this causing problems for OpenMP offloading? Currently I'm trying to rebase on trunk my old patch, which fixes offloading from dlopened libraries: http://gcc.gnu.org/ml/gcc-patches/2014-11/msg01604.html It works for OpenMP and MIC, but I don't know how not to break OpenACC and PTX. Grüße, Thomas pgpNuLLIR31d6.pgp Description: PGP signature
Re: Merge current set of OpenACC changes from gomp-4_0-branch
On 26 Jan 14:44, Thomas Schwinge wrote: On 17 Jan 02:16, Ilya Verbin wrote: Such things are not covered by the testsuite, that's why you missed this issue. Here is a simple testcase: http://news.gmane.org/find-root.php?message_id=%3C20150116231632.GB48380%40msticlxl57.ims.intel.com%3E Probably a good motivation for adding such a test case. ;-) I thought about it, but I don't know how to compile 2 binaries and run one of them using dejagnu. So, you don't assume that a device can have multiple images from multiple libs? This probably is just a bug that we introduced with our changes? (Julian?) Also, could you please explain, why did you divide a device initialization into two functions -- gomp_init_device and gomp_init_tables? As I understand it (again, Julian, please correct me if I got that wrong), the reason is that for OpenACC support, we need these as two separate (independent) actions. Is this causing problems for OpenMP offloading? I'm asking since in this patch http://gcc.gnu.org/ml/gcc-patches/2014-11/msg01604.html I tried to change libgomp-plugin interface to enable offloading from libs, loaded at any time. My proposal was to replace GOMP_OFFLOAD_register_image and GOMP_OFFLOAD_get_table with GOMP_OFFLOAD_[un]load_image. When target device is initialized, GOMP_OFFLOAD_load_image registers one image in the plugin and returns corresponding target addresses for the image. The mapping between host and target addresses happens as previously. I hope that this approach is suitable for both MIC and PTX. Here is my current patch, it works for OpenMP-MIC, but obviously will not work for PTX, since it requires symmetrical changes in the plugin. Could you please take a look, whether it is possible to support this new interface in PTX plugin? diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h index d9cbff5..1072ae4 100644 --- a/libgomp/libgomp-plugin.h +++ b/libgomp/libgomp-plugin.h @@ -51,14 +51,12 @@ enum offload_target_type OFFLOAD_TARGET_TYPE_INTEL_MIC = 6 }; -/* Auxiliary struct, used for transferring a host-target address range mapping - from plugin to libgomp. */ -struct mapping_table +/* Auxiliary struct, used for transferring pairs of addresses from plugin + to libgomp. */ +struct addr_pair { - uintptr_t host_start; - uintptr_t host_end; - uintptr_t tgt_start; - uintptr_t tgt_end; + uintptr_t start; + uintptr_t end; }; /* Miscellaneous functions. */ diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index 3089401..4e021f9 100644 --- a/libgomp/libgomp.h +++ b/libgomp/libgomp.h @@ -773,10 +773,10 @@ struct gomp_device_descr unsigned int (*get_caps_func) (void); int (*get_type_func) (void); int (*get_num_devices_func) (void); - void (*register_image_func) (void *, void *); void (*init_device_func) (int); void (*fini_device_func) (int); - int (*get_table_func) (int, struct mapping_table **); + int (*load_image_func) (int, void *, struct addr_pair **); + void (*unload_image_func) (int, void *); void *(*alloc_func) (int, size_t); void (*free_func) (int, void *); void *(*dev2host_func) (int, void *, const void *, size_t); @@ -793,9 +793,6 @@ struct gomp_device_descr /* Set to true when device is initialized. */ bool is_initialized; - /* True when offload regions have been registered with this device. */ - bool offload_regions_registered; - /* OpenACC-specific data and functions. */ /* This is mutable because of its mutable data_environ and target_data members. */ diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map index f44174e..2b2b953 100644 --- a/libgomp/libgomp.map +++ b/libgomp/libgomp.map @@ -231,6 +231,7 @@ GOMP_4.0 { GOMP_4.0.1 { global: GOMP_offload_register; + GOMP_offload_unregister; } GOMP_4.0; OACC_2.0 { diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c index 6aeb1e7..5d67c6c 100644 --- a/libgomp/oacc-host.c +++ b/libgomp/oacc-host.c @@ -43,10 +43,10 @@ static struct gomp_device_descr host_dispatch = .get_caps_func = GOMP_OFFLOAD_get_caps, .get_type_func = GOMP_OFFLOAD_get_type, .get_num_devices_func = GOMP_OFFLOAD_get_num_devices, -.register_image_func = GOMP_OFFLOAD_register_image, .init_device_func = GOMP_OFFLOAD_init_device, .fini_device_func = GOMP_OFFLOAD_fini_device, -.get_table_func = GOMP_OFFLOAD_get_table, +.load_image_func = GOMP_OFFLOAD_load_image, +.unload_image_func = GOMP_OFFLOAD_unload_image, .alloc_func = GOMP_OFFLOAD_alloc, .free_func = GOMP_OFFLOAD_free, .dev2host_func = GOMP_OFFLOAD_dev2host, @@ -56,7 +56,6 @@ static struct gomp_device_descr host_dispatch = .mem_map.is_initialized = false, .mem_map.splay_tree.root = NULL, .is_initialized = false, -.offload_regions_registered = false, .openacc = { .open_device_func = GOMP_OFFLOAD_openacc_open_device, diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c index
Re: Merge current set of OpenACC changes from gomp-4_0-branch
On Fri, Jan 23, 2015 at 08:20:53PM +0300, Ilya Verbin wrote: On 17 Jan 02:16, Ilya Verbin wrote: Hi! Unfortunately, it broke offloading from shared libraries (I mean common libs with NEEDED entries, not dlopened). Such things are not covered by the testsuite, that's why you missed this issue. Here is a simple testcase: ... So, you don't assume that a device can have multiple images from multiple libs? Ping? Also, could you please explain, why did you divide a device initialization into two functions -- gomp_init_device and gomp_init_tables? Currently I'm trying to rebase on trunk my old patch, which fixes offloading from dlopened libraries: http://gcc.gnu.org/ml/gcc-patches/2014-11/msg01604.html It works for OpenMP and MIC, but I don't know how not to break OpenACC and PTX. There is also the problem that GOMP_offload_register doesn't use any locking, so one thread could be in the middle of dlopening some shared library and doing GOMP_offload_register in there, and another thread calling gomp_target_init at the same time, so you could reference freed memory if GOMP_offload_register had to reallocate etc. Jakub
Re: Merge current set of OpenACC changes from gomp-4_0-branch
On 17 Jan 02:16, Ilya Verbin wrote: Hi! Unfortunately, it broke offloading from shared libraries (I mean common libs with NEEDED entries, not dlopened). Such things are not covered by the testsuite, that's why you missed this issue. Here is a simple testcase: ... So, you don't assume that a device can have multiple images from multiple libs? Ping? Also, could you please explain, why did you divide a device initialization into two functions -- gomp_init_device and gomp_init_tables? Currently I'm trying to rebase on trunk my old patch, which fixes offloading from dlopened libraries: http://gcc.gnu.org/ml/gcc-patches/2014-11/msg01604.html It works for OpenMP and MIC, but I don't know how not to break OpenACC and PTX. -- Ilya
Re: Merge current set of OpenACC changes from gomp-4_0-branch
Hello! In r219682, I have committed to trunk our current set of OpenACC changes, which we had prepared on gomp-4_0-branch. Thanks to everyone who has been contributing! This patch breaks bootstrap on alpha-linux-gnu: /space/homedirs/uros/gcc-build/./gcc/xgcc -B/space/homedirs/uros/gcc-build/./gcc/ -B/usr/local/alphaev68-unknown-linux-gnu/bin/ -B/usr/local/alphaev68-unknown-linux-gnu/lib/ -isystem /usr/local/alphaev68-unknown-linux-gnu/include -isystem /usr/local/alphaev68-unknown-linux-gnu/sys-include -DHAVE_CONFIG_H -I. -I../../../gcc-svn/trunk/libgomp -I../../../gcc-svn/trunk/libgomp/config/linux/alpha -I../../../gcc-svn/trunk/libgomp/config/linux -I../../../gcc-svn/trunk/libgomp/config/posix -I../../../gcc-svn/trunk/libgomp -I../../../gcc-svn/trunk/libgomp/../include -Wall -Werror -pthread -ftls-model=initial-exec -g -O2 -mieee -MT oacc-init.lo -MD -MP -MF .deps/oacc-init.Tpo -c ../../../gcc-svn/trunk/libgomp/oacc-init.c -fPIC -DPIC -o .libs/oacc-init.o ../../../gcc-svn/trunk/libgomp/oacc-init.c: In function ‘acc_on_device’: ../../../gcc-svn/trunk/libgomp/oacc-init.c:571:10: internal compiler error: Segmentation fault return __builtin_acc_on_device (dev); ^ 0x120bf4c13 crash_signal ../../gcc-svn/trunk/gcc/toplev.c:381 0x1203fc55c expand_builtin_acc_on_device ../../gcc-svn/trunk/gcc/builtins.c:5933 0x120400063 expand_builtin(tree_node*, rtx_def*, rtx_def*, machine_mode, int) ../../gcc-svn/trunk/gcc/builtins.c:7087 0x12063b5b3 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) ../../gcc-svn/trunk/gcc/expr.c:10488 0x12062cc93 expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) ../../gcc-svn/trunk/gcc/expr.c:8017 0x1206090bb expand_expr ../../gcc-svn/trunk/gcc/expr.h:254 0x12062e2a7 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, expand_modifier) ../../gcc-svn/trunk/gcc/expr.c:8248 0x12063ce2f expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) ../../gcc-svn/trunk/gcc/expr.c:10779 0x12062cc93 expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) ../../gcc-svn/trunk/gcc/expr.c:8017 0x1206090bb expand_expr ../../gcc-svn/trunk/gcc/expr.h:254 (gdb) r Starting program: /space/uros/gcc-build/gcc/cc1 -quiet -g -O2 -mieee oacc-init.i Program received signal SIGSEGV, Segmentation fault. 0x0001203fc55c in expand_builtin_acc_on_device (exp=0x2b17688, target=0x0) at ../../gcc-svn/trunk/gcc/builtins.c:5933 5933 if (!REG_P (target) || GET_MODE (target) != target_mode) (gdb) p target $1 = (rtx) 0x0 (gdb) list 5928#else 5929 v1 = GEN_INT (GOMP_DEVICE_NONE); 5930 v2 = GEN_INT (GOMP_DEVICE_HOST); 5931#endif 5932 machine_mode target_mode = TYPE_MODE (integer_type_node); 5933 if (!REG_P (target) || GET_MODE (target) != target_mode) 5934target = gen_reg_rtx (target_mode); 5935 emit_move_insn (target, const1_rtx); 5936 rtx_code_label *done_label = gen_label_rtx (); 5937 do_compare_rtx_and_jump (v, v1, EQ, false, v_mode, NULL_RTX, target is NULL in expand_builtin_acc_on_device. (I can open a PR with a preprocessed file if requested). Uros.
Re: Merge current set of OpenACC changes from gomp-4_0-branch
On Fri, Jan 16, 2015 at 11:39 AM, Uros Bizjak ubiz...@gmail.com wrote: Hello! In r219682, I have committed to trunk our current set of OpenACC changes, which we had prepared on gomp-4_0-branch. Thanks to everyone who has been contributing! This patch breaks bootstrap on alpha-linux-gnu: target is NULL in expand_builtin_acc_on_device. Following patch allows bootstrap to continue: --cut here-- Index: builtins.c === --- builtins.c (revision 219712) +++ builtins.c (working copy) @@ -5930,7 +5930,7 @@ expand_builtin_acc_on_device (tree exp, rtx target v2 = GEN_INT (GOMP_DEVICE_HOST); #endif machine_mode target_mode = TYPE_MODE (integer_type_node); - if (!REG_P (target) || GET_MODE (target) != target_mode) + if (!target || !register_operand (target, target_mode)) target = gen_reg_rtx (target_mode); emit_move_insn (target, const1_rtx); rtx_code_label *done_label = gen_label_rtx (); --cut here-- Uros.
Re: Merge current set of OpenACC changes from gomp-4_0-branch
On Fri, Jan 16, 2015 at 11:46:35AM +0100, Uros Bizjak wrote: On Fri, Jan 16, 2015 at 11:39 AM, Uros Bizjak ubiz...@gmail.com wrote: Hello! In r219682, I have committed to trunk our current set of OpenACC changes, which we had prepared on gomp-4_0-branch. Thanks to everyone who has been contributing! This patch breaks bootstrap on alpha-linux-gnu: target is NULL in expand_builtin_acc_on_device. Following patch allows bootstrap to continue: Ok for trunk with proper ChangeLog entry. Thanks. --- builtins.c (revision 219712) +++ builtins.c (working copy) @@ -5930,7 +5930,7 @@ expand_builtin_acc_on_device (tree exp, rtx target v2 = GEN_INT (GOMP_DEVICE_HOST); #endif machine_mode target_mode = TYPE_MODE (integer_type_node); - if (!REG_P (target) || GET_MODE (target) != target_mode) + if (!target || !register_operand (target, target_mode)) target = gen_reg_rtx (target_mode); emit_move_insn (target, const1_rtx); rtx_code_label *done_label = gen_label_rtx (); --cut here-- Uros. Jakub
Re: Merge current set of OpenACC changes from gomp-4_0-branch
Hi Thomas, On Thursday 2015-01-15 21:20, Thomas Schwinge wrote: In r219682, I have committed to trunk our current set of OpenACC changes, which we had prepared on gomp-4_0-branch. Thanks to everyone who has been contributing! this breaks bootstrap on FreeBSD 8/amd64 from what I can tell: libtool: compile: /scratch/tmp/gerald/OBJ-0116-1138/./gcc/xgcc -B/scratch/tmp/gerald/OBJ-0116-1138/./gcc/ -B/home/gerald/gcc-ref8-amd64/x86_64-unknown-freebsd8.4/bin/ -B/home/gerald/gcc-ref8-amd64/x86_64-unknown-freebsd8.4/lib/ -isystem /home/gerald/gcc-ref8-amd64/x86_64-unknown-freebsd8.4/include -isystem /home/gerald/gcc-ref8-amd64/x86_64-unknown-freebsd8.4/sys-include -DHAVE_CONFIG_H -I. -I/sc ratch/tmp/gerald/gcc-HEAD/libgomp -I/scratch/tmp/gerald/gcc-HEAD/libgomp/config/ posix -I/scratch/tmp/gerald/gcc-HEAD/libgomp -I/scratch/tmp/gerald/gcc-HEAD/libg omp/../include -Wall -pthread -Werror -g -O2 -MT work.lo -MD -MP -MF .deps/work. Tpo -c /scratch/tmp/gerald/gcc-HEAD/libgomp/work.c -o work.o /dev/null 21 /scratch/tmp/gerald/gcc-HEAD/libgomp/oacc-parallel.c:37:20: fatal error: alloca. h: No such file or directory compilation terminated. % find /usr/include/ -name alloca.h % i.e., FreeBSD does not feature the alloca.h header and declares alloca() in stdlib.h. Gerald
Re: Merge current set of OpenACC changes from gomp-4_0-branch
Once I work around the previous failure, I quickly get another one on FreeBSD 8.4/amd64: /scratch/tmp/gerald/gcc-HEAD/libgomp/target.c:67:12: error: \xe2\x80\x98num_devices\xe2\x80\x99 defined but not used [-Werror=unused-variable] static int num_devices; ^ This one did not require autoconf, so I went ahead and fixed it myself. ;-) Committed after successful bootstrap on x86_64-unknown-freebsd8.4 (after hacking around the other failure). Gerald 2015-01-16 Gerald Pfeifer ger...@pfeifer.com * target.c (num_devices): Guard with PLUGIN_SUPPORT. Index: target.c === --- target.c(revision 219741) +++ target.c(revision 219742) @@ -63,8 +63,10 @@ /* Array of descriptors for all available devices. */ static struct gomp_device_descr *devices; +#ifdef PLUGIN_SUPPORT /* Total number of available devices. */ static int num_devices; +#endif /* Number of GOMP_OFFLOAD_CAP_OPENMP_400 devices. */ static int num_devices_openmp;
Re: Merge current set of OpenACC changes from gomp-4_0-branch
On Thu, Jan 15, 2015 at 09:20:07PM +0100, Thomas Schwinge wrote: * builtin-types.def (BT_FN_VOID_INT_INT_VAR) (BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR_INT_INT_VAR) (BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR): New function types. This broke bootstrap with --enable-languages=jit. Fixed thusly, committed as obvious: 2015-01-16 Jakub Jelinek ja...@redhat.com * jit-builtins.h (DEF_FUNCTION_TYPE_VAR_5): Fix spelling of last argument. (DEF_FUNCTION_TYPE_VAR_8, DEF_FUNCTION_TYPE_VAR_12): Define and undef afterwards. * jit-builtins.c (DEF_FUNCTION_TYPE_VAR_8, DEF_FUNCTION_TYPE_VAR_12): Likewise. --- gcc/jit/jit-builtins.h.jj 2015-01-14 11:00:22.0 +0100 +++ gcc/jit/jit-builtins.h 2015-01-16 15:35:13.440088390 +0100 @@ -45,7 +45,12 @@ enum jit_builtin_type #define DEF_FUNCTION_TYPE_VAR_2(NAME, RETURN, ARG1, ARG2) NAME, #define DEF_FUNCTION_TYPE_VAR_3(NAME, RETURN, ARG1, ARG2, ARG3) NAME, #define DEF_FUNCTION_TYPE_VAR_4(NAME, RETURN, ARG1, ARG2, ARG3, ARG4) NAME, -#define DEF_FUNCTION_TYPE_VAR_5(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG6) \ +#define DEF_FUNCTION_TYPE_VAR_5(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \ + NAME, +#define DEF_FUNCTION_TYPE_VAR_8(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \ + ARG6, ARG7, ARG8) NAME, +#define DEF_FUNCTION_TYPE_VAR_12(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \ +ARG6, ARG7, ARG8, ARG9, ARG10, ARG11, ARG12) \ NAME, #define DEF_POINTER_TYPE(NAME, TYPE) NAME, #include builtin-types.def @@ -65,6 +70,8 @@ enum jit_builtin_type #undef DEF_FUNCTION_TYPE_VAR_3 #undef DEF_FUNCTION_TYPE_VAR_4 #undef DEF_FUNCTION_TYPE_VAR_5 +#undef DEF_FUNCTION_TYPE_VAR_8 +#undef DEF_FUNCTION_TYPE_VAR_12 #undef DEF_POINTER_TYPE BT_LAST }; /* enum jit_builtin_type */ --- gcc/jit/jit-builtins.c.jj 2015-01-14 11:00:22.0 +0100 +++ gcc/jit/jit-builtins.c 2015-01-16 15:35:02.176276537 +0100 @@ -313,6 +313,15 @@ builtins_manager::make_type (enum jit_bu case ENUM: return make_fn_type (ENUM, RETURN, 1, 4, ARG1, ARG2, ARG3, ARG4); #define DEF_FUNCTION_TYPE_VAR_5(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \ case ENUM: return make_fn_type (ENUM, RETURN, 1, 5, ARG1, ARG2, ARG3, ARG4, ARG5); +#define DEF_FUNCTION_TYPE_VAR_8(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \ + ARG6, ARG7, ARG8) \ + case ENUM: return make_fn_type (ENUM, RETURN, 1, 8, ARG1, ARG2, ARG3, \ + ARG4, ARG5, ARG6, ARG7, ARG8); +#define DEF_FUNCTION_TYPE_VAR_12(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \ +ARG6, ARG7, ARG8, ARG9, ARG10, ARG11, ARG12) \ + case ENUM: return make_fn_type (ENUM, RETURN, 1, 12, ARG1, ARG2, ARG3, \ + ARG4, ARG5, ARG6, ARG7, ARG8, ARG9, \ + ARG10, ARG11, ARG12); #define DEF_POINTER_TYPE(ENUM, TYPE) \ case ENUM: return make_ptr_type (ENUM, TYPE); @@ -334,6 +343,8 @@ builtins_manager::make_type (enum jit_bu #undef DEF_FUNCTION_TYPE_VAR_3 #undef DEF_FUNCTION_TYPE_VAR_4 #undef DEF_FUNCTION_TYPE_VAR_5 +#undef DEF_FUNCTION_TYPE_VAR_8 +#undef DEF_FUNCTION_TYPE_VAR_12 #undef DEF_POINTER_TYPE default: Jakub
Re: Merge current set of OpenACC changes from gomp-4_0-branch
On Fri, 2015-01-16 at 15:39 +0100, Jakub Jelinek wrote: On Thu, Jan 15, 2015 at 09:20:07PM +0100, Thomas Schwinge wrote: * builtin-types.def (BT_FN_VOID_INT_INT_VAR) (BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR_INT_INT_VAR) (BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR): New function types. This broke bootstrap with --enable-languages=jit. Fixed thusly, committed as obvious: 2015-01-16 Jakub Jelinek ja...@redhat.com * jit-builtins.h (DEF_FUNCTION_TYPE_VAR_5): Fix spelling of last argument. (DEF_FUNCTION_TYPE_VAR_8, DEF_FUNCTION_TYPE_VAR_12): Define and undef afterwards. * jit-builtins.c (DEF_FUNCTION_TYPE_VAR_8, DEF_FUNCTION_TYPE_VAR_12): Likewise. Thanks!
Re: Merge current set of OpenACC changes from gomp-4_0-branch
Hi Gerald! On Fri, 16 Jan 2015 13:32:10 +0100 (CET), Gerald Pfeifer ger...@pfeifer.com wrote: On Thursday 2015-01-15 21:20, Thomas Schwinge wrote: In r219682, I have committed to trunk our current set of OpenACC changes, which we had prepared on gomp-4_0-branch. Thanks to everyone who has been contributing! this breaks bootstrap on FreeBSD 8/amd64 from what I can tell: Sorry for that. And, thanks for fixing the num_devices issue. /scratch/tmp/gerald/gcc-HEAD/libgomp/oacc-parallel.c:37:20: fatal error: alloca. h: No such file or directory compilation terminated. % find /usr/include/ -name alloca.h % i.e., FreeBSD does not feature the alloca.h header and declares alloca() in stdlib.h. The fix is simple enough; committed to trunk in r219771, as obvious: commit a6f19a7c6b55f96d0c6dc65914857fc8e9b30aaf Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Fri Jan 16 20:05:21 2015 + libgomp: Don't use alloca.h. libgomp/ * oacc-parallel.c: Don't include alloca.h. (GOACC_parallel): Use gomp_alloca instead of alloca. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@219771 138bc75d-0d04-0410-961f-82ee72b054a4 --- libgomp/ChangeLog | 5 + libgomp/oacc-parallel.c | 3 +-- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git libgomp/ChangeLog libgomp/ChangeLog index 7c106d4..065dfd4 100644 --- libgomp/ChangeLog +++ libgomp/ChangeLog @@ -1,3 +1,8 @@ +2015-01-16 Thomas Schwinge tho...@codesourcery.com + + * oacc-parallel.c: Don't include alloca.h. + (GOACC_parallel): Use gomp_alloca instead of alloca. + 2015-01-16 Gerald Pfeifer ger...@pfeifer.com * target.c (num_devices): Guard with PLUGIN_SUPPORT. diff --git libgomp/oacc-parallel.c libgomp/oacc-parallel.c index 6d5386b..b5e8060 100644 --- libgomp/oacc-parallel.c +++ libgomp/oacc-parallel.c @@ -34,7 +34,6 @@ #include string.h #include stdarg.h #include assert.h -#include alloca.h static int find_pset (int pos, size_t mapnum, unsigned short *kinds) @@ -151,7 +150,7 @@ GOACC_parallel (int device, void (*fn) (void *), const void *offload_table, tgt = gomp_map_vars (acc_dev, mapnum, hostaddrs, NULL, sizes, kinds, true, false); - devaddrs = alloca (sizeof (void *) * mapnum); + devaddrs = gomp_alloca (sizeof (void *) * mapnum); for (i = 0; i mapnum; i++) devaddrs[i] = (void *) (tgt-list[i]-tgt-tgt_start + tgt-list[i]-tgt_offset); Grüße, Thomas signature.asc Description: PGP signature
[PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter (was: Merge current set of OpenACC changes from gomp-4_0-branch)
Hi! On Thu, 15 Jan 2015 21:20:07 +0100, I wrote: In r219682, I have committed to trunk our current set of OpenACC changes, Here is a patch to remove the __OFFLOAD_SYMBOL__ variable/formal parameter, as discussed in https://gcc.gnu.org/PR64625. But -- I now wonder whether that's actually the issue that has been reported in the PR; doesn't that more look like a problem with the __OFFLOAD_TABLE__ symbol defined in libgcc/offloadstuff.c, and used in the mkoffload tools (such as gcc/config/i386/intelmic-mkoffload.c)? Can anyone guess what's going on? Anyway, as discussed in https://gcc.gnu.org/PR64625, I'd like to commit this patch either way, OK? commit 4409d0129118479c1cd1adbcfa96316ac4e734b0 Author: Thomas Schwinge tho...@codesourcery.com Date: Fri Jan 16 20:12:12 2015 +0100 [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter. gcc/ * omp-low.c (offload_symbol_decl): Remove variable. (get_offload_symbol_decl): Remove function. (expand_omp_target): For BUILT_IN_GOMP_TARGET, BUILT_IN_GOMP_TARGET_DATA, BUILT_IN_GOMP_TARGET_UPDATE pass NULL instead of __OFFLOAD_TABLE__, for BUILT_IN_GOACC_DATA_START, BUILT_IN_GOACC_ENTER_EXIT_DATA, BUILT_IN_GOACC_PARALLEL, BUILT_IN_GOACC_UPDATE don't pass it at all. libgomp/ * libgomp_g.h (GOACC_data_start, GOACC_enter_exit_data) (GOACC_parallel, GOACC_update): Remove const_void *offload_table formal parameter. Update all users. * target.c (GOMP_target, GOMP_target_data, GOMP_target_update): Document unused formal parameter. --- gcc/omp-low.c | 45 ++--- libgomp/libgomp_g.h | 10 +- libgomp/oacc-parallel.c | 8 libgomp/target.c| 11 +-- 4 files changed, 32 insertions(+), 42 deletions(-) diff --git gcc/omp-low.c gcc/omp-low.c index b7bf338..1589310 100644 --- gcc/omp-low.c +++ gcc/omp-low.c @@ -340,30 +340,6 @@ oacc_max_threads (omp_context *ctx) /* Holds offload tables with decls. */ vectree, va_gc *offload_funcs, *offload_vars; -/* Holds a decl for __OFFLOAD_TABLE__. */ -static GTY(()) tree offload_symbol_decl; - -/* Get the __OFFLOAD_TABLE__ symbol. */ -static tree -get_offload_symbol_decl (void) -{ - if (!offload_symbol_decl) -{ - tree decl = build_decl (UNKNOWN_LOCATION, VAR_DECL, - get_identifier (__OFFLOAD_TABLE__), - ptr_type_node); - TREE_ADDRESSABLE (decl) = 1; - TREE_PUBLIC (decl) = 1; - DECL_EXTERNAL (decl) = 1; - DECL_WEAK (decl) = 1; - DECL_ATTRIBUTES (decl) - = tree_cons (get_identifier (weak), -NULL_TREE, DECL_ATTRIBUTES (decl)); - offload_symbol_decl = decl; -} - return offload_symbol_decl; -} - /* Convenience function for calling scan_omp_1_op on tree operands. */ static inline tree @@ -9119,16 +9095,31 @@ expand_omp_target (struct omp_region *region) } gimple g; - tree offload_table = get_offload_symbol_decl (); vectree *args; /* The maximum number used by any start_ix, without varargs. */ - unsigned int argcnt = 12; + unsigned int argcnt = 11; vec_alloc (args, argcnt); args-quick_push (device); if (offloaded) args-quick_push (build_fold_addr_expr (child_fn)); - args-quick_push (build_fold_addr_expr (offload_table)); + switch (start_ix) +{ +case BUILT_IN_GOMP_TARGET: +case BUILT_IN_GOMP_TARGET_DATA: +case BUILT_IN_GOMP_TARGET_UPDATE: + /* This const void * is part of the current ABI, but we're not actually +using it. */ + args-quick_push (build_zero_cst (ptr_type_node)); + break; +case BUILT_IN_GOACC_DATA_START: +case BUILT_IN_GOACC_ENTER_EXIT_DATA: +case BUILT_IN_GOACC_PARALLEL: +case BUILT_IN_GOACC_UPDATE: + break; +default: + gcc_unreachable (); +} args-quick_push (t1); args-quick_push (t2); args-quick_push (t3); diff --git libgomp/libgomp_g.h libgomp/libgomp_g.h index c1e4e63..5e88d45 100644 --- libgomp/libgomp_g.h +++ libgomp/libgomp_g.h @@ -217,15 +217,15 @@ extern void GOMP_teams (unsigned int, unsigned int); /* oacc-parallel.c */ -extern void GOACC_data_start (int, const void *, - size_t, void **, size_t *, unsigned short *); +extern void GOACC_data_start (int, size_t, void **, size_t *, + unsigned short *); extern void GOACC_data_end (void); -extern void GOACC_enter_exit_data (int, const void *, size_t, void **, +extern void GOACC_enter_exit_data (int, size_t, void **, size_t *, unsigned short *, int, int, ...); -extern void GOACC_parallel (int, void (*) (void *), const void *, size_t, +extern void GOACC_parallel (int, void (*) (void *), size_t, void **, size_t *, unsigned short *, int, int, int, int, int, ...);
Re: Merge current set of OpenACC changes from gomp-4_0-branch
FAIL: c-c++-common/goacc/acc_on_device-2-off.c -std=c++98 scan-rtl-dump-times expand (call [^n]*acc_on_device 1 FAIL: c-c++-common/goacc/acc_on_device-2-off.c -std=c++11 scan-rtl-dump-times expand (call [^n]*acc_on_device 1 FAIL: c-c++-common/goacc/acc_on_device-2-off.c -std=c++14 scan-rtl-dump-times expand (call [^n]*acc_on_device 1 XPASS: c-c++-common/goacc/acc_on_device-2.c -std=c++98 scan-rtl-dump-times expand (call [^n]*acc_on_device 0 XPASS: c-c++-common/goacc/acc_on_device-2.c -std=c++11 scan-rtl-dump-times expand (call [^n]*acc_on_device 0 XPASS: c-c++-common/goacc/acc_on_device-2.c -std=c++14 scan-rtl-dump-times expand (call [^n]*acc_on_device 0 FAIL: c-c++-common/goacc/acc_on_device-2-off.c scan-rtl-dump-times expand (call [^n]*acc_on_device 1 FAIL: gcc.dg/goacc/acc_on_device-1.c scan-rtl-dump-times expand (call [^n]*acc_on_device 4 You are making invalid assumptions about the form of a call pattern. (call_insn 7 6 8 2 (set (reg:SI 0 %d0) (call (mem:QI (reg/f:SI 33) [0 acc_on_device S1 A8]) (const_int 4 [0x4]))) /daten/aranym/gcc/gcc-20150116/gcc/testsuite/c-c++-common/goacc/acc_on_device-2-off.c:19 -1 (nil) (nil)) Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
Re: [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter (was: Merge current set of OpenACC changes from gomp-4_0-branch)
On 16 Jan 21:34, Thomas Schwinge wrote: On Thu, 15 Jan 2015 21:20:07 +0100, I wrote: Here is a patch to remove the __OFFLOAD_SYMBOL__ variable/formal parameter, as discussed in https://gcc.gnu.org/PR64625. But -- I now wonder whether that's actually the issue that has been reported in the PR; doesn't that more look like a problem with the __OFFLOAD_TABLE__ symbol defined in libgcc/offloadstuff.c, and used in the mkoffload tools (such as gcc/config/i386/intelmic-mkoffload.c)? Can anyone guess what's going on? Why do you think so? __OFFLOAD_TABLE__ symbol lives in libgcc/offloadstuff.c since November without regressions. -- Ilya
Re: Merge current set of OpenACC changes from gomp-4_0-branch
Hi! On 15 Jan 21:20, Thomas Schwinge wrote: In r219682, I have committed to trunk our current set of OpenACC changes, which we had prepared on gomp-4_0-branch. Thanks to everyone who has been contributing! Unfortunately, it broke offloading from shared libraries (I mean common libs with NEEDED entries, not dlopened). Such things are not covered by the testsuite, that's why you missed this issue. Here is a simple testcase: + test.c: + int f_aaa (void); int main () { int x = f_aaa (); #pragma omp target x++; return x; } + libaaa.c: + int f_aaa (void) { int x = 0; #pragma omp target x = 10; return x; } ++ $ gcc -fopenmp -shared -fPIC libaaa.c -o libaaa.so $ gcc -fopenmp -L. -laaa test.c $ ./a.out libgomp: Target function wasn't mapped The problem seems to be here: -gomp_register_images_for_device (struct gomp_device_descr *device) +gomp_register_image_for_device (struct gomp_device_descr *device, + struct offload_image_descr *image) { - int i; - for (i = 0; i num_offload_images; i++) + if (!device-offload_regions_registered + (device-type == image-type + || device-type == OFFLOAD_TARGET_TYPE_HOST)) { - struct offload_image_descr *image = offload_images[i]; - if (image-type == device-type) - device-register_image_func (image-host_table, image-target_data); + device-register_image_func (image-host_table, image-target_data); + device-offload_regions_registered = true; } } So, you don't assume that a device can have multiple images from multiple libs? -- Ilya
Re: [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter (was: Merge current set of OpenACC changes from gomp-4_0-branch)
On 86_64 Fedora 15, current gcc trunk only produces… nm libgcc_s.so.1 | grep OFF 00215478 d _GLOBAL_OFFSET_TABLE_ and not __OFFLOAD_TABLE__, The libgcc_s.so.1 built on x86_64-apple-darwin14 doesn't even contain the _GLOBAL_OFFSET_TABLE_ symbol. On Fri, Jan 16, 2015 at 5:40 PM, Ilya Verbin iver...@gmail.com wrote: On 16 Jan 21:34, Thomas Schwinge wrote: On Thu, 15 Jan 2015 21:20:07 +0100, I wrote: Here is a patch to remove the __OFFLOAD_SYMBOL__ variable/formal parameter, as discussed in https://gcc.gnu.org/PR64625. But -- I now wonder whether that's actually the issue that has been reported in the PR; doesn't that more look like a problem with the __OFFLOAD_TABLE__ symbol defined in libgcc/offloadstuff.c, and used in the mkoffload tools (such as gcc/config/i386/intelmic-mkoffload.c)? Can anyone guess what's going on? Why do you think so? __OFFLOAD_TABLE__ symbol lives in libgcc/offloadstuff.c since November without regressions. -- Ilya
Re: [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter (was: Merge current set of OpenACC changes from gomp-4_0-branch)
On 16 Jan 18:22, Jack Howarth wrote: On 86_64 Fedora 15, current gcc trunk only produces… nm libgcc_s.so.1 | grep OFF 00215478 d _GLOBAL_OFFSET_TABLE_ and not __OFFLOAD_TABLE__, The libgcc_s.so.1 built on x86_64-apple-darwin14 doesn't even contain the _GLOBAL_OFFSET_TABLE_ symbol. On Fri, Jan 16, 2015 at 5:40 PM, Ilya Verbin iver...@gmail.com wrote: Why do you think so? __OFFLOAD_TABLE__ symbol lives in libgcc/offloadstuff.c since November without regressions. That's correct. 1. offloadstuff.c isn't linked into libgcc_s.so.1 2. __OFFLOAD_TABLE__ is guarded with ENABLE_OFFLOADING, which is disabled in default configuration. -- Ilya
Re: [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter (was: Merge current set of OpenACC changes from gomp-4_0-branch)
As I read https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64625#c3, the requirement for __OFFLOAD_TABLE__ was not longer present and the residual usages of it just had to be removed. The weak symbol on darwin is fragile and seems to trip up on the existing code which produces undefined symbols for ___OFFLOAD_TABLE__... # nm e.50.1.o | grep OFF U ___OFFLOAD_TABLE__ rather than $ nm e.50.1.o | grep OFF w __OFFLOAD_TABLE__ for all of the test cases. On Fri, Jan 16, 2015 at 6:30 PM, Ilya Verbin iver...@gmail.com wrote: On 16 Jan 18:22, Jack Howarth wrote: On 86_64 Fedora 15, current gcc trunk only produces… nm libgcc_s.so.1 | grep OFF 00215478 d _GLOBAL_OFFSET_TABLE_ and not __OFFLOAD_TABLE__, The libgcc_s.so.1 built on x86_64-apple-darwin14 doesn't even contain the _GLOBAL_OFFSET_TABLE_ symbol. On Fri, Jan 16, 2015 at 5:40 PM, Ilya Verbin iver...@gmail.com wrote: Why do you think so? __OFFLOAD_TABLE__ symbol lives in libgcc/offloadstuff.c since November without regressions. That's correct. 1. offloadstuff.c isn't linked into libgcc_s.so.1 2. __OFFLOAD_TABLE__ is guarded with ENABLE_OFFLOADING, which is disabled in default configuration. -- Ilya
Re: [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter (was: Merge current set of OpenACC changes from gomp-4_0-branch)
On 16 Jan 19:23, Jack Howarth wrote: As I read https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64625#c3, the requirement for __OFFLOAD_TABLE__ was not longer present and the residual usages of it just had to be removed. The weak symbol on darwin is fragile and seems to trip up on the existing code which produces undefined symbols for ___OFFLOAD_TABLE__... # nm e.50.1.o | grep OFF U ___OFFLOAD_TABLE__ rather than $ nm e.50.1.o | grep OFF w __OFFLOAD_TABLE__ for all of the test cases. I believe that the initial patch, which removes get_offload_symbol_decl, will fix this. -- Ilya
Re: [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter (was: Merge current set of OpenACC changes from gomp-4_0-branch)
Confirmed that this patch eliminates [Bug libgomp/64625] ___OFFLOAD_TABLE__ symbol not produced on x86_64 darwin and thus exposes [Bug libgomp/64635] New: darwin produces libgomp-plugin-host_nonshm.1.dylib but tries to load libgomp-plugin-host_nonshm.so.1 The additional hack (which should be fixed with configure/Makefile. changes to detect SHLIBEXT)... @@ -1055,7 +1054,7 @@ static void gomp_target_init (void) { const char *prefix =libgomp-plugin-; - const char *suffix = .so.1; + const char *suffix = .1.dylib; const char *cur, *next; char *plugin_name; to target.c in libgomp eliminates the second bug. Native configuration is x86_64-apple-darwin14.1.0 === libgomp tests === Schedule of variations: unix/-m32 unix/-m64 Running target unix/-m32 Using /sw/share/dejagnu/baseboards/unix.exp as board description file for target. Using /sw/share/dejagnu/config/unix.exp as generic interface file for target. Using /sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/config/default.exp as tool-and-target-specific interface file. Running /sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.c/c.exp ... Running /sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.c++/c++.exp ... Running /sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.fortran/fortran.exp ... Running /sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.graphite/graphite.exp ... Running /sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.oacc-c/c.exp ... Running /sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.oacc-c++/c++.exp ... Running /sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp ... === libgomp Summary for unix/-m32 === # of expected passes 5715 # of unsupported tests 281 Running target unix/-m64 Using /sw/share/dejagnu/baseboards/unix.exp as board description file for target. Using /sw/share/dejagnu/config/unix.exp as generic interface file for target. Using /sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/config/default.exp as tool-and-target-specific interface file. Running /sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.c/c.exp ... Running /sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.c++/c++.exp ... Running /sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.fortran/fortran.exp ... Running /sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.graphite/graphite.exp ... Running /sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.oacc-c/c.exp ... Running /sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.oacc-c++/c++.exp ... Running /sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp ... === libgomp Summary for unix/-m64 === # of expected passes 5715 # of unsupported tests 281 === libgomp Summary === # of expected passes 11430 # of unsupported tests 562 On Fri, Jan 16, 2015 at 3:34 PM, Thomas Schwinge tho...@codesourcery.com wrote: Hi! On Thu, 15 Jan 2015 21:20:07 +0100, I wrote: In r219682, I have committed to trunk our current set of OpenACC changes, Here is a patch to remove the __OFFLOAD_SYMBOL__ variable/formal parameter, as discussed in https://gcc.gnu.org/PR64625. But -- I now wonder whether that's actually the issue that has been reported in the PR; doesn't that more look like a problem with the __OFFLOAD_TABLE__ symbol defined in libgcc/offloadstuff.c, and used in the mkoffload tools (such as gcc/config/i386/intelmic-mkoffload.c)? Can anyone guess what's going on? Anyway, as discussed in https://gcc.gnu.org/PR64625, I'd like to commit this patch either way, OK? commit 4409d0129118479c1cd1adbcfa96316ac4e734b0 Author: Thomas Schwinge tho...@codesourcery.com Date: Fri Jan 16 20:12:12 2015 +0100 [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter. gcc/ * omp-low.c (offload_symbol_decl): Remove variable. (get_offload_symbol_decl): Remove function. (expand_omp_target): For BUILT_IN_GOMP_TARGET, BUILT_IN_GOMP_TARGET_DATA, BUILT_IN_GOMP_TARGET_UPDATE pass NULL instead of __OFFLOAD_TABLE__, for BUILT_IN_GOACC_DATA_START, BUILT_IN_GOACC_ENTER_EXIT_DATA, BUILT_IN_GOACC_PARALLEL, BUILT_IN_GOACC_UPDATE don't pass it at all. libgomp/ * libgomp_g.h (GOACC_data_start, GOACC_enter_exit_data) (GOACC_parallel, GOACC_update): Remove const_void *offload_table formal parameter. Update all users. * target.c (GOMP_target, GOMP_target_data, GOMP_target_update): Document unused formal parameter. --- gcc/omp-low.c | 45 ++---
Re: Merge current set of OpenACC changes from gomp-4_0-branch
On 01/15/15 13:20, Thomas Schwinge wrote: Hi! In r219682, I have committed to trunk our current set of OpenACC changes, which we had prepared on gomp-4_0-branch. Thanks to everyone who has been contributing! Note that this is an experimental feature, incomplete, and subject to change in future versions of GCC. We shall update -- and keep updated -- https://gcc.gnu.org/wiki/OpenACC, to track the current status. (Please come back to that page in a few days, it has not yet been updated.) Please note that there are still a handful of patches pending (posted weeks ago, need to ping) that are needed for nvptx offloading, so that's not yet functional. Definitely ping them. We're trying hard to get things closed down and knowing what's still out there is very important. jeff
Re: Merge current set of OpenACC changes from gomp-4_0-branch
Hi Thomas, thanks to you and all others involved for the OpenACC merge. Attached is a patch which converts for Fortran '%s' into %qs, as mentioned to before. (It wasn't possible when the original patch was reviewed as the common diagnostic patches came later.) Committed as Rev. 219694. On the Fortran side: Compared with C/C++, support for acc cache seems to be missing (PR63865, contains draft-patch link). (Other PRs: the device_resident clause is not supported (PR63859) - I don't know whether it is supported in C/C++ or not. And there are two ICEs (PR63865, PR63858).) – Is some work on those planed on your side for GCC 5 or more likely not? Tobias Thomas Schwinge wrote: In r219682, I have committed to trunk our current set of OpenACC changes, which we had prepared on gomp-4_0-branch. Thanks to everyone who has been contributing! Note that this is an experimental feature, incomplete, and subject to change in future versions of GCC. We shall update -- and keep updated -- https://gcc.gnu.org/wiki/OpenACC, to track the current status. (Please come back to that page in a few days, it has not yet been updated.) Please note that there are still a handful of patches pending (posted weeks ago, need to ping) that are needed for nvptx offloading, so that's not yet functional. Here's the commit log. The patch itself is too big to post inline, so please find it attached, gzipped. 2015-01-15 Tobias Burnus bur...@net-b.de * openmp.c (check_symbol_not_pointer, resolve_oacc_data_clauses, resolve_oacc_deviceptr_clause, resolve_omp_clauses, gfc_resolve_oacc_declare): Replace '%s' by %qs. diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c index 005739b..422e977 100644 --- a/gcc/fortran/openmp.c +++ b/gcc/fortran/openmp.c @@ -1457,7 +1457,7 @@ gfc_match_oacc_routine (void) if (!sym-attr.external !sym-attr.function !sym-attr.subroutine) { gfc_error (Syntax error in !$ACC ROUTINE ( NAME ) at %C, invalid - function name '%s', sym-name); + function name %qs, sym-name); gfc_current_locus = old_loc; return MATCH_ERROR; } @@ -2649,29 +2649,29 @@ static void check_symbol_not_pointer (gfc_symbol *sym, locus loc, const char *name) { if (sym-ts.type == BT_DERIVED sym-attr.pointer) -gfc_error (POINTER object '%s' of derived type in %s clause at %L, +gfc_error (POINTER object %qs of derived type in %s clause at %L, sym-name, name, loc); if (sym-ts.type == BT_DERIVED sym-attr.cray_pointer) -gfc_error (Cray pointer object of derived type '%s' in %s clause at %L, +gfc_error (Cray pointer object of derived type %qs in %s clause at %L, sym-name, name, loc); if (sym-ts.type == BT_DERIVED sym-attr.cray_pointee) -gfc_error (Cray pointee object of derived type '%s' in %s clause at %L, +gfc_error (Cray pointee object of derived type %qs in %s clause at %L, sym-name, name, loc); if ((sym-ts.type == BT_ASSUMED sym-attr.pointer) || (sym-ts.type == BT_CLASS CLASS_DATA (sym) CLASS_DATA (sym)-attr.pointer)) -gfc_error (POINTER object '%s' of polymorphic type in %s clause at %L, +gfc_error (POINTER object %qs of polymorphic type in %s clause at %L, sym-name, name, loc); if ((sym-ts.type == BT_ASSUMED sym-attr.cray_pointer) || (sym-ts.type == BT_CLASS CLASS_DATA (sym) CLASS_DATA (sym)-attr.cray_pointer)) -gfc_error (Cray pointer object of polymorphic type '%s' in %s clause at %L, +gfc_error (Cray pointer object of polymorphic type %qs in %s clause at %L, sym-name, name, loc); if ((sym-ts.type == BT_ASSUMED sym-attr.cray_pointee) || (sym-ts.type == BT_CLASS CLASS_DATA (sym) CLASS_DATA (sym)-attr.cray_pointee)) -gfc_error (Cray pointee object of polymorphic type '%s' in %s clause at %L, +gfc_error (Cray pointee object of polymorphic type %qs in %s clause at %L, sym-name, name, loc); } @@ -2681,14 +2681,14 @@ static void check_array_not_assumed (gfc_symbol *sym, locus loc, const char *name) { if (sym-as sym-as-type == AS_ASSUMED_SIZE) -gfc_error (Assumed size array '%s' in %s clause at %L, +gfc_error (Assumed size array %qs in %s clause at %L, sym-name, name, loc); if (sym-as sym-as-type == AS_ASSUMED_RANK) -gfc_error (Assumed rank array '%s' in %s clause at %L, +gfc_error (Assumed rank array %qs in %s clause at %L, sym-name, name, loc); if (sym-as sym-as-type == AS_DEFERRED sym-attr.pointer !sym-attr.contiguous) -gfc_error (Noncontiguous deferred shape array '%s' in %s clause at %L, +gfc_error (Noncontiguous deferred shape array %qs in %s clause at %L, sym-name, name, loc); } @@ -2696,12 +2696,12 @@ static void resolve_oacc_data_clauses (gfc_symbol *sym, locus loc, const char *name) { if (sym-ts.type == BT_DERIVED sym-attr.allocatable) -gfc_error (ALLOCATABLE object '%s' of derived type in %s clause at %L, +gfc_error