On Mon, Oct 19, 2015 at 06:44:40PM +0200, Thomas Schwinge wrote:
> > How's the following (complete patch instead of incremental patch; the
> > driver changes are still the same as before)?  The changes are:
> > 
> >   * libgomp/target.c:gomp_target_init again loads all the plugins.
> >   * libgomp/target.c:resolve_device and
> >     libgomp/oacc-init.c:resolve_device verify that a default device
> >     (OpenMP device-var ICV, and acc_device_default, respectively) is
> >     actually enabled, or resort to host fallback if not.
> >   * GOMP_set_offload_targets renamed to GOMP_enable_offload_targets; used
> >     to enable devices specified by -foffload.  Can be called multiple
> >     times (executable, any shared libraries); the set of enabled devices
> >     is the union of all those ever requested.
> >   * GOMP_offload_register (but not the new GOMP_offload_register_ver)
> >     changed to enable all devices.  This is to maintain compatibility
> >     with old executables and shared libraries built without the -foffload
> >     constructor support.

Any reason not to pass the bitmask of the enabled targets to
GOMP_offload_register_ver instead, to decrease the amount of ctors and
the times you lock the various locks during initialization, or just enable
automatically the devices you load data for during GOMP_offload_register_ver?
I mean, GOMP_offload_register would enable for compatibility all devices,
GOMP_offload_register_ver would enable the device it is registered for.
For -foffload=disable on all shared libraries/binaries, naturally you would
not register anything, thus would not enable any devices (only host fallback
would work).

Or are you worried about the case where one shared library is compiled
with say -foffload=intelmic,ptx but doesn't actually contain any
#pragma omp target/#pragma omp declare target (or OpenACC similar
#directives), but only contains #pragma omp target data and/or the device
query/copying routines, then dlopens some other shared library that actually
has the offloading device code?
That could be solved by adding the call you are talking about, but
if we really should care about that unlikely case, it would be better to
only arrange for it if really needed by the shared library (i.e. if it calls
one of the OpenMP or OpenACC library routines that talk to the devices, or
has #pragma omp target data or similar constructs;
I'd strongly prefer not to have constructors in code that just got compiled
with -fopenmp, even in configuration where some offloading is configured by
default, when nothing in the code really cares about offloading.

> --- a/gcc/gcc.c
> +++ b/gcc/gcc.c
> @@ -401,6 +401,8 @@ static const char 
> *compare_debug_auxbase_opt_spec_function (int, const char **);
>  static const char *pass_through_libs_spec_func (int, const char **);
>  static const char *replace_extension_spec_func (int, const char **);
>  static const char *greater_than_spec_func (int, const char **);
> +static const char *add_omp_infile_spec_func (int, const char **);
> +
>  static char *convert_white_space (char *);
>  
>  /* The Specs Language

I'd like to defer review of the driver bits, can Joseph or Bernd please have
a look at those?

> diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
> index 24fbb94..5da4fa7 100644
> --- a/libgomp/libgomp-plugin.h
> +++ b/libgomp/libgomp-plugin.h
> @@ -48,7 +48,8 @@ enum offload_target_type
>    OFFLOAD_TARGET_TYPE_HOST = 2,
>    /* OFFLOAD_TARGET_TYPE_HOST_NONSHM = 3 removed.  */
>    OFFLOAD_TARGET_TYPE_NVIDIA_PTX = 5,
> -  OFFLOAD_TARGET_TYPE_INTEL_MIC = 6
> +  OFFLOAD_TARGET_TYPE_INTEL_MIC = 6,
> +  OFFLOAD_TARGET_TYPE_HWM

What is HWM?  Is that OFFLOAD_TARGET_TYPE_LAST what you mean?

> diff --git a/libgomp/target.c b/libgomp/target.c
> index b767410..df51bfb 100644
> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -72,6 +72,9 @@ static int num_offload_images;
>  /* Array of descriptors for all available devices.  */
>  static struct gomp_device_descr *devices;
>  
> +/* Set of enabled devices.  */
> +static bool devices_enabled[OFFLOAD_TARGET_TYPE_HWM];

I must say I don't like the locking for this.
If all you ever change on this is that you change it from 0 to 1,
then supposedly just storing it with __atomic_store, perhaps with
rel semantics, and reading it as __atomic_load, with acquire semantics,
would be good enough?  And perhaps change it into int array,
so that it is actually atomic even on the old Alphas (if there are any
around).

        Jakub

Reply via email to