Re: Regression in target MIC compiler (was: nvptx offloading patches [3/n], RFD)

2015-07-31 Thread Jakub Jelinek
On Fri, Jul 31, 2015 at 07:53:16PM +0300, Ilya Verbin wrote:
 On Fri, Jul 31, 2015 at 19:27:58 +0300, Ilya Verbin wrote:
  I've noticed that target MIC compiler from trunk hangs forever in
  lto_input_mode_table in this loop, even on simple testcases.
  
  On Wed, Feb 18, 2015 at 11:00:35 +0100, Jakub Jelinek wrote:
  +  /* First search just the GET_CLASS_NARROWEST_MODE to wider modes,
  +if not found, fallback to all modes.  */
  +  int pass;
  +  for (pass = 0; pass  2; pass++)
  +   for (machine_mode mr = pass ? VOIDmode
  +   : GET_CLASS_NARROWEST_MODE (mclass);
  +pass ? mr  MAX_MACHINE_MODE : mr != VOIDmode;
  +pass ? mr = (machine_mode) (m + 1)
  + : mr = GET_MODE_WIDER_MODE (mr))
  + if (GET_MODE_CLASS (mr) != mclass
  + || GET_MODE_SIZE (mr) != size
  + || GET_MODE_PRECISION (mr) != prec
  + || GET_MODE_INNER (mr) != inner
  + || GET_MODE_IBIT (mr) != ibit
  + || GET_MODE_FBIT (mr) != fbit
  + || GET_MODE_NUNITS (mr) != nunits)
  +   continue;
  
  Given that gomp-4_1-branch works ok, the problem was introduced somewhere
  between 9 and 31 Jul.  I'll try to find the revision.
 
 Shouldn't 'mr' be here instead of 'm'?

I think so.  If it works, patch preapproved.
But wonder what changed that we haven't been triggering it before.
What mode do you think it on (mclass/size/prec/inner/ibit/fbit/nunits)?

Jakub


Regression in target MIC compiler (was: nvptx offloading patches [3/n], RFD)

2015-07-31 Thread Ilya Verbin
Hi!

I've noticed that target MIC compiler from trunk hangs forever in
lto_input_mode_table in this loop, even on simple testcases.

On Wed, Feb 18, 2015 at 11:00:35 +0100, Jakub Jelinek wrote:
+  /* First search just the GET_CLASS_NARROWEST_MODE to wider modes,
+if not found, fallback to all modes.  */
+  int pass;
+  for (pass = 0; pass  2; pass++)
+   for (machine_mode mr = pass ? VOIDmode
+   : GET_CLASS_NARROWEST_MODE (mclass);
+pass ? mr  MAX_MACHINE_MODE : mr != VOIDmode;
+pass ? mr = (machine_mode) (m + 1)
+ : mr = GET_MODE_WIDER_MODE (mr))
+ if (GET_MODE_CLASS (mr) != mclass
+ || GET_MODE_SIZE (mr) != size
+ || GET_MODE_PRECISION (mr) != prec
+ || GET_MODE_INNER (mr) != inner
+ || GET_MODE_IBIT (mr) != ibit
+ || GET_MODE_FBIT (mr) != fbit
+ || GET_MODE_NUNITS (mr) != nunits)
+   continue;

Given that gomp-4_1-branch works ok, the problem was introduced somewhere
between 9 and 31 Jul.  I'll try to find the revision.

  -- Ilya


Re: Regression in target MIC compiler (was: nvptx offloading patches [3/n], RFD)

2015-07-31 Thread Ilya Verbin
On Fri, Jul 31, 2015 at 19:27:58 +0300, Ilya Verbin wrote:
 I've noticed that target MIC compiler from trunk hangs forever in
 lto_input_mode_table in this loop, even on simple testcases.
 
 On Wed, Feb 18, 2015 at 11:00:35 +0100, Jakub Jelinek wrote:
 +  /* First search just the GET_CLASS_NARROWEST_MODE to wider modes,
 +  if not found, fallback to all modes.  */
 +  int pass;
 +  for (pass = 0; pass  2; pass++)
 + for (machine_mode mr = pass ? VOIDmode
 + : GET_CLASS_NARROWEST_MODE (mclass);
 +  pass ? mr  MAX_MACHINE_MODE : mr != VOIDmode;
 +  pass ? mr = (machine_mode) (m + 1)
 +   : mr = GET_MODE_WIDER_MODE (mr))
 +   if (GET_MODE_CLASS (mr) != mclass
 +   || GET_MODE_SIZE (mr) != size
 +   || GET_MODE_PRECISION (mr) != prec
 +   || GET_MODE_INNER (mr) != inner
 +   || GET_MODE_IBIT (mr) != ibit
 +   || GET_MODE_FBIT (mr) != fbit
 +   || GET_MODE_NUNITS (mr) != nunits)
 + continue;
 
 Given that gomp-4_1-branch works ok, the problem was introduced somewhere
 between 9 and 31 Jul.  I'll try to find the revision.

Shouldn't 'mr' be here instead of 'm'?

 mr = (machine_mode) (m + 1)

  -- Ilya


Re: Regression in target MIC compiler (was: nvptx offloading patches [3/n], RFD)

2015-07-31 Thread Ilya Verbin
On Fri, Jul 31, 2015 at 18:59:59 +0200, Jakub Jelinek wrote:
 On Fri, Jul 31, 2015 at 07:53:16PM +0300, Ilya Verbin wrote:
  On Fri, Jul 31, 2015 at 19:27:58 +0300, Ilya Verbin wrote:
   I've noticed that target MIC compiler from trunk hangs forever in
   lto_input_mode_table in this loop, even on simple testcases.
   
   On Wed, Feb 18, 2015 at 11:00:35 +0100, Jakub Jelinek wrote:
   +  /* First search just the GET_CLASS_NARROWEST_MODE to wider modes,
   +  if not found, fallback to all modes.  */
   +  int pass;
   +  for (pass = 0; pass  2; pass++)
   + for (machine_mode mr = pass ? VOIDmode
   + : GET_CLASS_NARROWEST_MODE (mclass);
   +  pass ? mr  MAX_MACHINE_MODE : mr != VOIDmode;
   +  pass ? mr = (machine_mode) (m + 1)
   +   : mr = GET_MODE_WIDER_MODE (mr))
   +   if (GET_MODE_CLASS (mr) != mclass
   +   || GET_MODE_SIZE (mr) != size
   +   || GET_MODE_PRECISION (mr) != prec
   +   || GET_MODE_INNER (mr) != inner
   +   || GET_MODE_IBIT (mr) != ibit
   +   || GET_MODE_FBIT (mr) != fbit
   +   || GET_MODE_NUNITS (mr) != nunits)
   + continue;
   
   Given that gomp-4_1-branch works ok, the problem was introduced somewhere
   between 9 and 31 Jul.  I'll try to find the revision.
  
  Shouldn't 'mr' be here instead of 'm'?
 
 I think so.  If it works, patch preapproved.

It fixes the infinite loop, but causes an error:
lto1: fatal error: unsupported mode QI

 But wonder what changed that we haven't been triggering it before.
 What mode do you think it on (mclass/size/prec/inner/ibit/fbit/nunits)?

When in hangs, mr is HImode.

  -- Ilya


Offloading compilers' libgcc installation (was: nvptx offloading patches [3/n], RFD)

2015-02-20 Thread Thomas Schwinge
Hi Bernd!

On Thu, 19 Feb 2015 10:28:46 +0100, Bernd Schmidt ber...@codesourcery.com 
wrote:
 issue when trying to 
 get at the libgcc for the nvptx accel compiler after it's been 
 installed. The libgcc Makefile puts it in the wrong place - 
 gcc/nvptx-none/accel/nvptx-none instead of gcc/host/accel/nvptx-none. 
 The patch below corrects that and removes an intelmicemul special case 
 which I believe has the same effect - Ilya, could you test this?

Works fine for me for intelmic (no changes), and nvptx (changes as
expected).

You'll want to remove the following debugging print statement before
commit:

 --- libgcc/configure.ac   (revision 445788)
 +++ libgcc/configure.ac   (working copy)
 @@ -398,16 +398,14 @@ esac
  
  # Used for constructing correct paths for offload compilers.
  accel_dir_suffix=
 +real_host_noncanonical=${host_noncanonical}
 +echo eaaf: $enable_as_accelerator_for


Grüße,
 Thomas


pgp6YdwbeWs3N.pgp
Description: PGP signature


Re: If we're building an offloading compiler, always enable the LTO front end (was: nvptx offloading patches [3/n], RFD)

2015-02-20 Thread Thomas Schwinge
Hi!

On Thu, 19 Feb 2015 11:51:02 +0100, Jakub Jelinek ja...@redhat.com wrote:
 On Thu, Feb 19, 2015 at 11:48:17AM +0100, Thomas Schwinge wrote:
  Like this?
 
 Yes.
 
  commit 56c0312469f583ba3fa9fa2777981742ab6d6c75
  Author: Thomas Schwinge tho...@codesourcery.com
  Date:   Thu Feb 19 11:41:23 2015 +0100
  
  If we're building an offloading compiler, always enable the LTO front 
  end.
  
  * configure.ac [--enable-as-accelerator-for] (enable_languages):
  Make sure it contains lto.
  * configure: Regenerate.
 
 Ok for trunk.

Committed in r220838.


Grüße,
 Thomas


pgpWuUpNXU1Ht.pgp
Description: PGP signature


Re: Offloading compilers' libgcc installation (was: nvptx offloading patches [3/n], RFD)

2015-02-20 Thread Ilya Verbin
On Fri, Feb 20, 2015 at 10:27:26 +0100, Thomas Schwinge wrote:
 On Thu, 19 Feb 2015 10:28:46 +0100, Bernd Schmidt ber...@codesourcery.com 
 wrote:
  issue when trying to 
  get at the libgcc for the nvptx accel compiler after it's been 
  installed. The libgcc Makefile puts it in the wrong place - 
  gcc/nvptx-none/accel/nvptx-none instead of gcc/host/accel/nvptx-none. 
  The patch below corrects that and removes an intelmicemul special case 
  which I believe has the same effect - Ilya, could you test this?
 
 Works fine for me for intelmic (no changes), and nvptx (changes as
 expected).

OK to me.

Thanks,
  -- Ilya


Re: nvptx offloading patches [3/n], RFD

2015-02-19 Thread Bernd Schmidt

On 02/17/2015 05:40 PM, Jakub Jelinek wrote:

On Tue, Feb 17, 2015 at 04:21:06PM +, Joseph Myers wrote:

On Tue, 17 Feb 2015, Jakub Jelinek wrote:


Third attempt failed with:
../../../libgcc/config/nvptx/realloc.c:24:20: fatal error: stdlib.h: No such 
file or directory
compilation terminated.
../../../libgcc/static-object.mk:17: recipe for target 'realloc.o' failed
make[2]: *** [realloc.o] Error 1
make[2]: *** Waiting for unfinished jobs
make[2]: Leaving directory '/usr/src/gcc/objnvptx/nvptx-none/libgcc'
I have nvptx-newlib symlinked into the gcc tree as newlib, so I expected it
would be built in-tree, is that not the case (at least wiki/Offloading
mentions that).  Or is it just that libgcc can't really have dependencies on
newlib headers as newlib is built after libgcc?


I've committed this patch to fix this last issue (the header dependence,
that is; I don't know about the in-tree build).


Thanks, sure, libgcc now builds fine, the in-tree build fails:
configure:4261: checking for C compiler default output file name
configure:4283: /usr/src/gcc/objnvptx/./gcc/xgcc -B/usr/src/gcc/objnvptx/./gcc/ 
-nostdinc -B/usr/src/gcc/objnvptx/nvptx-none/newlib/ -isystem 
/usr/src/gcc/objnvptx/nvptx-none/newlib/targ-include -isystem 
/usr/src/gcc/newlib/libc/include -B/usr/local/nvptx-none/bin/ 
-B/usr/local/nvptx-none/lib/ -isystem /usr/local/nvptx-none/include -isystem 
/usr/local/nvptx-none/sys-include-g -O2   conftest.c  5
error opening libc.a
collect2: error: ld returned 1 exit status
very early during in-tree newlib configure.


Not a fix for your problem, but there's a similar issue when trying to 
get at the libgcc for the nvptx accel compiler after it's been 
installed. The libgcc Makefile puts it in the wrong place - 
gcc/nvptx-none/accel/nvptx-none instead of gcc/host/accel/nvptx-none. 
The patch below corrects that and removes an intelmicemul special case 
which I believe has the same effect - Ilya, could you test this?



Bernd

Index: libgcc/Makefile.in
===
--- libgcc/Makefile.in	(revision 445788)
+++ libgcc/Makefile.in	(working copy)
@@ -45,6 +45,7 @@ fixed_point = @fixed_point@
 with_aix_soname = @with_aix_soname@
 
 host_noncanonical = @host_noncanonical@
+real_host_noncanonical = @real_host_noncanonical@
 target_noncanonical = @target_noncanonical@
 
 # List of extra object files that should be compiled for this target machine.
@@ -185,7 +186,7 @@ STRIP = @STRIP@
 STRIP_FOR_TARGET = $(STRIP)
 
 # Directory in which the compiler finds libraries etc.
-libsubdir = $(libdir)/gcc/$(host_noncanonical)/$(version)@accel_dir_suffix@
+libsubdir = $(libdir)/gcc/$(real_host_noncanonical)/$(version)@accel_dir_suffix@
 # Used to install the shared libgcc.
 slibdir = @slibdir@
 # Maybe used for DLLs on Windows targets.
Index: libgcc/configure.ac
===
--- libgcc/configure.ac	(revision 445788)
+++ libgcc/configure.ac	(working copy)
@@ -398,16 +398,14 @@ esac
 
 # Used for constructing correct paths for offload compilers.
 accel_dir_suffix=
+real_host_noncanonical=${host_noncanonical}
+echo eaaf: $enable_as_accelerator_for
 if test x$enable_as_accelerator_for != x; then
   accel_dir_suffix=/accel/${target_noncanonical}
-  case ${target_noncanonical} in
-*-intelmicemul-*)
-  # In this case we expect offload compiler to be built as native, so we
-  # need to change install directory for driver to be able to find libgcc.
-  host_noncanonical=${enable_as_accelerator_for} ;;
-  esac
+  real_host_noncanonical=${enable_as_accelerator_for}
 fi
 AC_SUBST(accel_dir_suffix)
+AC_SUBST(real_host_noncanonical)
 
 if test x$enable_offload_targets != x; then
   extra_parts=${extra_parts} crtoffloadbegin.o crtoffloadend.o
Index: libgcc/configure
===
--- libgcc/configure	(revision 445788)
+++ libgcc/configure	(working copy)
@@ -566,6 +566,7 @@ sfp_machine_header
 set_use_emutls
 set_have_cc_tls
 vis_hide
+real_host_noncanonical
 accel_dir_suffix
 force_explicit_eh_registry
 fixed_point
@@ -4482,17 +4483,15 @@ esac
 
 # Used for constructing correct paths for offload compilers.
 accel_dir_suffix=
+real_host_noncanonical=${host_noncanonical}
+echo eaaf: $enable_as_accelerator_for
 if test x$enable_as_accelerator_for != x; then
   accel_dir_suffix=/accel/${target_noncanonical}
-  case ${target_noncanonical} in
-*-intelmicemul-*)
-  # In this case we expect offload compiler to be built as native, so we
-  # need to change install directory for driver to be able to find libgcc.
-  host_noncanonical=${enable_as_accelerator_for} ;;
-  esac
+  real_host_noncanonical=${enable_as_accelerator_for}
 fi
 
 
+
 if test x$enable_offload_targets != x; then
   extra_parts=${extra_parts} crtoffloadbegin.o crtoffloadend.o
 fi


If we're building an offloading compiler, always enable the LTO front end (was: nvptx offloading patches [3/n], RFD)

2015-02-19 Thread Thomas Schwinge
Hi!

On Wed, 18 Feb 2015 13:35:18 +0100, Jakub Jelinek ja...@redhat.com wrote:
 On Wed, Feb 18, 2015 at 01:09:53PM +0100, Thomas Schwinge wrote:
  On Wed, 18 Feb 2015 12:34:38 +0100, Jakub Jelinek ja...@redhat.com wrote:
   offloading fails:
   
   /usr/src/gcc/objnvptxinst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/5.0.0//accel/nvptx-none/mkoffload
@/tmp/cce9PdmR
   x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: language lto not 
   recognized
   x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: language lto not 
   recognized
   mkoffload: fatal error: 
   /usr/src/gcc/objnvptxinst/usr/local/bin/x86_64-pc-linux-gnu-accel-nvptx-none-gcc
returned 1 exit status
   compilation terminated.
   lto-wrapper: fatal error: 
   /usr/src/gcc/objnvptxinst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/5.0.0//accel/nvptx-none/mkoffload
returned 1 exit status
   compilation terminated.
   /usr/bin/ld: lto-wrapper failed
   collect2: error: ld returned 1 exit status
   
   Is --enable-languages=c,c++,fortran,lto required when configuring the
   offload compiler?  It isn't required for intelmic.
  
  Yes, exactly.  I assume the reason is that x86_64-intelmicemul-linux-gnu
  defaults to supporting LTO, and due to this also defaults to building the
  LTO front end.  I'll enhance the nvptx offloading documentation
  accordingly.  Maybe we should add some magic to build the LTO front end
  if --enable-as-accelerator-for=[...] has been specified?
 
 Toplevel configure.ac has:
   # If LTO is enabled, add the LTO front end.
   if test $enable_lto = yes ; then
 case ,${enable_languages}, in
   *,lto,*) ;;
   *) enable_languages=${enable_languages},lto ;;
 esac
 if test ${build_lto_plugin} = yes ; then
   configdirs=$configdirs lto-plugin
 fi
   fi
 so IMHO we want similar snippet for the --enable-as-accelerator-for= case,
 perhaps right below this one.  Not building lto FE for the accelerator
 compilers make them completely useless, thus I think we really want to do
 that automatically.

Like this?

commit 56c0312469f583ba3fa9fa2777981742ab6d6c75
Author: Thomas Schwinge tho...@codesourcery.com
Date:   Thu Feb 19 11:41:23 2015 +0100

If we're building an offloading compiler, always enable the LTO front end.

* configure.ac [--enable-as-accelerator-for] (enable_languages):
Make sure it contains lto.
* configure: Regenerate.
---
 configure|8 
 configure.ac |8 
 2 files changed, 16 insertions(+)

diff --git configure configure
index dd794db..2afc52b 100755
--- configure
+++ configure
@@ -6217,6 +6217,14 @@ if test -d ${srcdir}/gcc; then
 fi
   fi
 
+  # If we're building an offloading compiler, add the LTO front end.
+  if test x$enable_as_accelerator_for != x ; then
+case ,${enable_languages}, in
+  *,lto,*) ;;
+  *) enable_languages=${enable_languages},lto ;;
+esac
+  fi
+
   missing_languages=`echo ,$enable_languages, | sed -e s/,all,/,/ -e 
s/,c,/,/ `
   potential_languages=,c,
 
diff --git configure.ac configure.ac
index 4ea5e00..08a6fbf 100644
--- configure.ac
+++ configure.ac
@@ -1918,6 +1918,14 @@ if test -d ${srcdir}/gcc; then
 fi
   fi
 
+  # If we're building an offloading compiler, add the LTO front end.
+  if test x$enable_as_accelerator_for != x ; then
+case ,${enable_languages}, in
+  *,lto,*) ;;
+  *) enable_languages=${enable_languages},lto ;;
+esac
+  fi
+
   missing_languages=`echo ,$enable_languages, | sed -e s/,all,/,/ -e 
s/,c,/,/ `
   potential_languages=,c,
 


Grüße,
 Thomas


pgpnYFvMGYhBl.pgp
Description: PGP signature


Offloading compilers' support libraries (was: nvptx offloading patches [3/n], RFD)

2015-02-19 Thread Thomas Schwinge
Hi!

On Thu, 19 Feb 2015 10:28:46 +0100, Bernd Schmidt ber...@codesourcery.com 
wrote:
 On 02/17/2015 05:40 PM, Jakub Jelinek wrote:
  On Tue, Feb 17, 2015 at 04:21:06PM +, Joseph Myers wrote:
  On Tue, 17 Feb 2015, Jakub Jelinek wrote:
 
  Third attempt failed with:
  ../../../libgcc/config/nvptx/realloc.c:24:20: fatal error: stdlib.h: No 
  such file or directory
  compilation terminated.
  ../../../libgcc/static-object.mk:17: recipe for target 'realloc.o' failed
  make[2]: *** [realloc.o] Error 1
  make[2]: *** Waiting for unfinished jobs
  make[2]: Leaving directory '/usr/src/gcc/objnvptx/nvptx-none/libgcc'
  I have nvptx-newlib symlinked into the gcc tree as newlib, so I expected 
  it
  would be built in-tree, is that not the case (at least wiki/Offloading
  mentions that).  Or is it just that libgcc can't really have dependencies 
  on
  newlib headers as newlib is built after libgcc?
 
  I've committed this patch to fix this last issue (the header dependence,
  that is; I don't know about the in-tree build).
 
  Thanks, sure, libgcc now builds fine, the in-tree build fails:
  configure:4261: checking for C compiler default output file name
  configure:4283: /usr/src/gcc/objnvptx/./gcc/xgcc 
  -B/usr/src/gcc/objnvptx/./gcc/ -nostdinc 
  -B/usr/src/gcc/objnvptx/nvptx-none/newlib/ -isystem 
  /usr/src/gcc/objnvptx/nvptx-none/newlib/targ-include -isystem 
  /usr/src/gcc/newlib/libc/include -B/usr/local/nvptx-none/bin/ 
  -B/usr/local/nvptx-none/lib/ -isystem /usr/local/nvptx-none/include 
  -isystem /usr/local/nvptx-none/sys-include-g -O2   conftest.c  5
  error opening libc.a
  collect2: error: ld returned 1 exit status
  very early during in-tree newlib configure.
 
 Not a fix for your problem, but there's a similar issue when trying to 
 get at the libgcc for the nvptx accel compiler after it's been 
 installed. The libgcc Makefile puts it in the wrong place - 
 gcc/nvptx-none/accel/nvptx-none instead of gcc/host/accel/nvptx-none. 

I also wondered about this; it's somewhere on my TODO list...

 The patch below corrects that and removes an intelmicemul special case 
 which I believe has the same effect - Ilya, could you test this?

This code has originally been posted in
http://news.gmane.org/find-root.php?message_id=%3C20140926123551.GA6892%40msticlxl57.ims.intel.com%3E.

This specific buglet aside (that the handling of intelmic and nvptx
offloading is inconsistent) -- will we have to add such handling to each
and every library that is built for the offloading compilers?  (Including
libraries that aren't part of the GCC sources, but may be built as part
of GCC's build process, such as when newlib is linked into [GCC]/newlib?)

One step back -- I understand correctly that this change is to make sure
that the regular target compiler and the offloading compilers don't clash
in their installed files' names?  (By putting them into the
accel/[offloading architecture]/ subdirectory?)  (As I've written in
http://news.gmane.org/find-root.php?message_id=%3C87vbize7zi.fsf%40schwinge.name%3E,
I currently install into separate prefixes/DESTDIRS, because I have not
yet verified that there is no overlap in the installed files.)

Then, why does this only apply to libsubdir?  What about header files,
documentation files, and so on?  (If they aren't expected to differ
between the target and offloading compilers, I think it's still not a
good idea to arbitrarely have them be overwritten by on respective build
tree's make install process.)  Should we have a more general solution to
this problem?

 Index: libgcc/Makefile.in
 ===
 --- libgcc/Makefile.in(revision 445788)
 +++ libgcc/Makefile.in(working copy)
 @@ -45,6 +45,7 @@ fixed_point = @fixed_point@
  with_aix_soname = @with_aix_soname@
  
  host_noncanonical = @host_noncanonical@
 +real_host_noncanonical = @real_host_noncanonical@
  target_noncanonical = @target_noncanonical@
  
  # List of extra object files that should be compiled for this target machine.
 @@ -185,7 +186,7 @@ STRIP = @STRIP@
  STRIP_FOR_TARGET = $(STRIP)
  
  # Directory in which the compiler finds libraries etc.
 -libsubdir = $(libdir)/gcc/$(host_noncanonical)/$(version)@accel_dir_suffix@
 +libsubdir = 
 $(libdir)/gcc/$(real_host_noncanonical)/$(version)@accel_dir_suffix@
  # Used to install the shared libgcc.
  slibdir = @slibdir@
  # Maybe used for DLLs on Windows targets.
 Index: libgcc/configure.ac
 ===
 --- libgcc/configure.ac   (revision 445788)
 +++ libgcc/configure.ac   (working copy)
 @@ -398,16 +398,14 @@ esac
  
  # Used for constructing correct paths for offload compilers.
  accel_dir_suffix=
 +real_host_noncanonical=${host_noncanonical}
 +echo eaaf: $enable_as_accelerator_for
  if test x$enable_as_accelerator_for != x; then
accel_dir_suffix=/accel/${target_noncanonical}
 -  case ${target_noncanonical} in
 -

Re: If we're building an offloading compiler, always enable the LTO front end (was: nvptx offloading patches [3/n], RFD)

2015-02-19 Thread Jakub Jelinek
On Thu, Feb 19, 2015 at 11:48:17AM +0100, Thomas Schwinge wrote:
 Like this?

Yes.

 commit 56c0312469f583ba3fa9fa2777981742ab6d6c75
 Author: Thomas Schwinge tho...@codesourcery.com
 Date:   Thu Feb 19 11:41:23 2015 +0100
 
 If we're building an offloading compiler, always enable the LTO front end.
 
   * configure.ac [--enable-as-accelerator-for] (enable_languages):
   Make sure it contains lto.
   * configure: Regenerate.

Ok for trunk.

Jakub


Re: nvptx offloading patches [3/n], RFD

2015-02-18 Thread Jakub Jelinek
On Wed, Feb 18, 2015 at 10:12:19AM +0100, Thomas Schwinge wrote:
 On Tue, 17 Feb 2015 17:40:33 +0100, Jakub Jelinek ja...@redhat.com wrote:
  On Tue, Feb 17, 2015 at 04:21:06PM +, Joseph Myers wrote:
   On Tue, 17 Feb 2015, Jakub Jelinek wrote:
I have nvptx-newlib symlinked into the gcc tree as newlib, so I 
expected it
would be built in-tree, is that not the case (at least wiki/Offloading
mentions that).
 
  configure:4261: checking for C compiler default output file name
  configure:4283: /usr/src/gcc/objnvptx/./gcc/xgcc 
  -B/usr/src/gcc/objnvptx/./gcc/ -nostdinc 
  -B/usr/src/gcc/objnvptx/nvptx-none/newlib/ -isystem 
  /usr/src/gcc/objnvptx/nvptx-none/newlib/targ-include -isystem 
  /usr/src/gcc/newlib/libc/include -B/usr/local/nvptx-none/bin/ 
  -B/usr/local/nvptx-none/lib/ -isystem /usr/local/nvptx-none/include 
  -isystem /usr/local/nvptx-none/sys-include-g -O2   conftest.c  5
  error opening libc.a
  collect2: error: ld returned 1 exit status
  very early during in-tree newlib configure.
 
 Do you literally have »nvptx-newlib symlinked into the gcc tree as
 newlib«?  If yes, then that should explain the problem: as I wrote in
 http://news.gmane.org/find-root.php?message_id=%3C87egq8mir1.fsf%40schwinge.name%3E,
 you need to »add a symbolic link to nvptx-newlib's newlib directory to
 the directory containing the GCC sources«, so not link [GCC]/newlib -
 [newlib-nvptx], but [GCC]/newlib - [newlib-nvptx]/newlib.  Does that
 resolve the issue?

My bad.  Yes, that does resolve the issue, make  make install now worked
for nvptx-none for me with the patches (2 from Bernd, my mode_table, my
t-nvptx).

Can you or Bernd comment on the other issues I've raised, i.e. whether you
are going to apply Bernd's approved patches, on the t-nvptx fix?

I'll try to have a look at the va_list stuff, if it blocks everything rather
than just testcases with va_list being offloaded.

Jakub


Re: nvptx offloading patches [3/n], RFD

2015-02-18 Thread Thomas Schwinge
Hi!

On Mon, 16 Feb 2015 22:08:12 +0100, Jakub Jelinek ja...@redhat.com wrote:
 On Mon, Feb 09, 2015 at 11:20:00AM +0100, Richard Biener wrote:
  I think (also communicated that on IRC) we should instead try not streaming
  machine-modes at all but generating them at stream-in time via layout_type
  or layout_decl.
 
 Here is a WIP prototype for being able to stream a machine mode description
 table and streaming it back in.  [...]

Many thanks for that!  (I had modified Bernd's patch to be less
intrusive, see attached, but of course that didn't resolve its design
problem.)

On Mon, 16 Feb 2015 22:43:49 +0100, Jakub Jelinek ja...@redhat.com wrote:
 [updated patch]

No regressions with
--enable-offload-targets=nvptx-none=[...],x86_64-intelmicemul-linux-gnu=[...].


Grüße,
 Thomas


commit 97a1ad0d3a96321ded8fad5e3a3cc75b46970bfa
Author: Thomas Schwinge tho...@codesourcery.com
Date:   Fri Feb 13 19:51:09 2015 +0100

Use the offload host CPU's modes.def when building an offloading compiler: make it less intrusive.

diff --git gcc/config.gcc gcc/config.gcc
index ebf0ee6..265ac0e 100644
--- gcc/config.gcc
+++ gcc/config.gcc
@@ -482,15 +482,15 @@ tilepro*-*-*)
 	;;
 esac
 
-offload_host_cpu_type=${cpu_type}
-if test x${enable_as_accelerator} != xno
-then
-	offload_host_cpu_type=`echo ${enable_as_accelerator_for} | sed 's/-.*$//'`
-fi
-case ${offload_host_cpu_type} in
-x86_64)
-  offload_host_cpu_type=i386
-	  ;;
+modes_cpu_type=${cpu_type}
+case ${enable_as_accelerator}:${target} in
+yes:nvptx-*-*)
+	modes_cpu_type=`echo ${enable_as_accelerator_for} | sed 's/-.*$//'`
+	case ${modes_cpu_type} in
+	x86_64)
+		modes_cpu_type=i386
+		;;
+	esac
 esac
 
 tm_file=${cpu_type}/${cpu_type}.h
@@ -499,9 +499,9 @@ then
 	tm_p_file=${cpu_type}/${cpu_type}-protos.h
 fi
 extra_modes=
-if test -f ${srcdir}/config/${offload_host_cpu_type}/${offload_host_cpu_type}-modes.def
+if test -f ${srcdir}/config/${modes_cpu_type}/${modes_cpu_type}-modes.def
 then
-	extra_modes=${offload_host_cpu_type}/${offload_host_cpu_type}-modes.def
+	extra_modes=${modes_cpu_type}/${modes_cpu_type}-modes.def
 fi
 if test -f ${srcdir}/config/${cpu_type}/${cpu_type}.opt
 then
diff --git gcc/config/i386/i386-modes.def gcc/config/i386/i386-modes.def
index 766681b..0b6a1f1 100644
--- gcc/config/i386/i386-modes.def
+++ gcc/config/i386/i386-modes.def
@@ -24,9 +24,6 @@ along with GCC; see the file COPYING3.  If not see
 FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_intel_96_format);
 FLOAT_MODE (TF, 16, ieee_quad_format);
 
-/* This file may be used when building a compiler for an offload target.
-   Assume that no special floating point options are used.  */
-#ifndef ACCEL_COMPILER
 /* In ILP32 mode, XFmode has size 12 and alignment 4.
In LP64 mode, XFmode has size and alignment 16.  */
 ADJUST_FLOAT_FORMAT (XF, (TARGET_128BIT_LONG_DOUBLE
@@ -36,7 +33,6 @@ ADJUST_FLOAT_FORMAT (XF, (TARGET_128BIT_LONG_DOUBLE
 			  : ieee_extended_intel_96_format));
 ADJUST_BYTESIZE  (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 12);
 ADJUST_ALIGNMENT (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 4);
-#endif
 
 /* Add any extra modes needed to represent the condition code.
 
diff --git gcc/config/nvptx/nvptx.h gcc/config/nvptx/nvptx.h
index 9a9954b..c0d97ee 100644
--- gcc/config/nvptx/nvptx.h
+++ gcc/config/nvptx/nvptx.h
@@ -64,6 +64,14 @@
 #define DOUBLE_TYPE_SIZE 64
 #define LONG_DOUBLE_TYPE_SIZE 64
 
+#ifdef ACCEL_COMPILER
+/* For ../i386/i386-modes.def.  */
+/* See ../i386/unix.h:TARGET_SUBTARGET64_DEFAULT.  */
+# define TARGET_128BIT_LONG_DOUBLE (TARGET_ABI64)
+/* See ../i386/i386.h:TARGET_96_ROUND_53_LONG_DOUBLE.  */
+# define TARGET_96_ROUND_53_LONG_DOUBLE 0
+#endif
+
 #undef SIZE_TYPE
 #define SIZE_TYPE (TARGET_ABI64 ? long unsigned int : unsigned int)
 #undef PTRDIFF_TYPE


signature.asc
Description: PGP signature


Re: nvptx offloading patches [3/n], RFD

2015-02-18 Thread Thomas Schwinge
Hi!

On Tue, 17 Feb 2015 17:40:33 +0100, Jakub Jelinek ja...@redhat.com wrote:
 On Tue, Feb 17, 2015 at 04:21:06PM +, Joseph Myers wrote:
  On Tue, 17 Feb 2015, Jakub Jelinek wrote:
   I have nvptx-newlib symlinked into the gcc tree as newlib, so I expected 
   it
   would be built in-tree, is that not the case (at least wiki/Offloading
   mentions that).

 configure:4261: checking for C compiler default output file name
 configure:4283: /usr/src/gcc/objnvptx/./gcc/xgcc 
 -B/usr/src/gcc/objnvptx/./gcc/ -nostdinc 
 -B/usr/src/gcc/objnvptx/nvptx-none/newlib/ -isystem 
 /usr/src/gcc/objnvptx/nvptx-none/newlib/targ-include -isystem 
 /usr/src/gcc/newlib/libc/include -B/usr/local/nvptx-none/bin/ 
 -B/usr/local/nvptx-none/lib/ -isystem /usr/local/nvptx-none/include -isystem 
 /usr/local/nvptx-none/sys-include-g -O2   conftest.c  5
 error opening libc.a
 collect2: error: ld returned 1 exit status
 very early during in-tree newlib configure.

Do you literally have »nvptx-newlib symlinked into the gcc tree as
newlib«?  If yes, then that should explain the problem: as I wrote in
http://news.gmane.org/find-root.php?message_id=%3C87egq8mir1.fsf%40schwinge.name%3E,
you need to »add a symbolic link to nvptx-newlib's newlib directory to
the directory containing the GCC sources«, so not link [GCC]/newlib -
[newlib-nvptx], but [GCC]/newlib - [newlib-nvptx]/newlib.  Does that
resolve the issue?


Grüße,
 Thomas


signature.asc
Description: PGP signature


Re: nvptx offloading patches [3/n], RFD

2015-02-18 Thread Jakub Jelinek
On Wed, Feb 18, 2015 at 10:12:19AM +0100, Thomas Schwinge wrote:
 Do you literally have »nvptx-newlib symlinked into the gcc tree as
 newlib«?  If yes, then that should explain the problem: as I wrote in
 http://news.gmane.org/find-root.php?message_id=%3C87egq8mir1.fsf%40schwinge.name%3E,
 you need to »add a symbolic link to nvptx-newlib's newlib directory to
 the directory containing the GCC sources«, so not link [GCC]/newlib -
 [newlib-nvptx], but [GCC]/newlib - [newlib-nvptx]/newlib.  Does that
 resolve the issue?

BTW, --with-cuda-driver-{include,lib} are apparently not documented in
gcc/doc/ (--with-cuda-driver neither, but can't use that, as lib is
/usr/local/cuda-6.5/lib64 in my case), and isn't documented on wiki/Offloading
either.

../configure --target=nvptx-none 
--enable-as-accelerator-for=x86_64-pc-linux-gnu 
--with-build-time-tools=/usr/src/gcc/objnvptxinst/usr/local/nvptx-none/bin 
--disable-sjlj-exceptions --enable-newlib-io-long-long
make; make DESTDIR=/usr/src/gcc/objnvptxinst install

and

../configure --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu 
--target=x86_64-pc-linux-gnu 
--enable-offload-targets=nvptx-none=/usr/src/gcc/objnvptxinst 
--disable-bootstrap --with-cuda-driver-include=/usr/local/cuda-6.5/include 
--with-cuda-driver-lib=/usr/local/cuda-6.5/lib64
make; make DESTDIR=/usr/src/gcc/objnvptxinst install

compilers now build, but offloading fails:

/usr/src/gcc/objnvptxinst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/5.0.0//accel/nvptx-none/mkoffload
 @/tmp/cce9PdmR
x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: language lto not recognized
x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: language lto not recognized
mkoffload: fatal error: 
/usr/src/gcc/objnvptxinst/usr/local/bin/x86_64-pc-linux-gnu-accel-nvptx-none-gcc
 returned 1 exit status
compilation terminated.
lto-wrapper: fatal error: 
/usr/src/gcc/objnvptxinst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/5.0.0//accel/nvptx-none/mkoffload
 returned 1 exit status
compilation terminated.
/usr/bin/ld: lto-wrapper failed
collect2: error: ld returned 1 exit status

Is --enable-languages=c,c++,fortran,lto required when configuring the
offload compiler?  It isn't required for intelmic.

Jakub


Re: nvptx offloading patches [3/n], RFD

2015-02-18 Thread Jakub Jelinek
On Tue, Feb 17, 2015 at 11:00:14AM +0100, Richard Biener wrote:
 I'm just looking for a way to make this less of a hack (and the LTO IL
 less target dependent).  Not for GCC 5 for which something like your
 patch is probably ok, but for the future.

So, given Ilya's and Thomas' testing, is this acceptable for now, and
perhaps we can try to do something better for GCC 6?

Here is the patch with full ChangeLog:

2015-02-18  Jakub Jelinek  ja...@redhat.com

* passes.c (ipa_write_summaries_1): Call lto_output_init_mode_table.
(ipa_write_optimization_summaries): Likewise.
* tree-streamer.h: Include data-streamer.h.
(streamer_mode_table): Declare extern variable.
(bp_pack_machine_mode, bp_unpack_machine_mode): New inline functions.
* lto-streamer-out.c (lto_output_init_mode_table,
lto_write_mode_table): New functions.
(produce_asm_for_decls): Call lto_write_mode_table when streaming
offloading LTO.
* lto-section-in.c (lto_section_name): Add mode_table entry.
(lto_create_simple_input_block): Add mode_table argument to the
lto_input_block constructors.
* ipa-prop.c (ipa_prop_read_section, read_replacements_section):
Likewise.
* data-streamer-in.c (string_for_index): Likewise.
* ipa-inline-analysis.c (inline_read_section): Likewise.
* ipa-icf.c (sem_item_optimizer::read_section): Likewise.
* lto-cgraph.c (input_cgraph_opt_section): Likewise.
* lto-streamer-in.c (lto_read_body_or_constructor,
lto_input_toplevel_asms): Likewise.
(lto_input_mode_table): New function.
* tree-streamer-out.c (pack_ts_fixed_cst_value_fields,
pack_ts_decl_common_value_fields, pack_ts_type_common_value_fields):
Use bp_pack_machine_mode.
* real.h (struct real_format): Add name field.
* lto-streamer.h (enum lto_section_type): Add LTO_section_mode_table.
(class lto_input_block): Add mode_table member.
(lto_input_block::lto_input_block): Add mode_table_ argument,
initialize mode_table.
(struct lto_file_decl_data): Add mode_table field.
(lto_input_mode_table, lto_output_init_mode_table): New prototypes.
* tree-streamer-in.c (unpack_ts_fixed_cst_value_fields,
unpack_ts_decl_common_value_fields,
unpack_ts_type_common_value_fields): Call bp_unpack_machine_mode.
* tree-streamer.c (streamer_mode_table): New variable.
* real.c (ieee_single_format, mips_single_format,
motorola_single_format, spu_single_format, ieee_double_format,
mips_double_format, motorola_double_format,
ieee_extended_motorola_format, ieee_extended_intel_96_format,
ieee_extended_intel_128_format, ieee_extended_intel_96_round_53_format,
ibm_extended_format, mips_extended_format, ieee_quad_format,
mips_quad_format, vax_f_format, vax_d_format, vax_g_format,
decimal_single_format, decimal_double_format, decimal_quad_format,
ieee_half_format, arm_half_format, real_internal_format): Add name
field.
* config/pdp11/pdp11.c (pdp11_f_format, pdp11_d_format): Likewise.
lto/
* lto.c (lto_mode_identity_table): New variable.
(lto_read_decls): Add mode_table argument to the lto_input_block
constructor.
(lto_file_finalize): Initialize mode_table.
(lto_init): Initialize lto_mode_identity_table.

--- gcc/passes.c.jj 2015-02-16 22:18:33.219702315 +0100
+++ gcc/passes.c2015-02-16 22:19:20.842917807 +0100
@@ -2460,6 +2460,7 @@ ipa_write_summaries_1 (lto_symtab_encode
   struct lto_out_decl_state *state = lto_new_out_decl_state ();
   state-symtab_node_encoder = encoder;
 
+  lto_output_init_mode_table ();
   lto_push_out_decl_state (state);
 
   gcc_assert (!flag_wpa);
@@ -2581,6 +2582,7 @@ ipa_write_optimization_summaries (lto_sy
   lto_symtab_encoder_iterator lsei;
   state-symtab_node_encoder = encoder;
 
+  lto_output_init_mode_table ();
   lto_push_out_decl_state (state);
   for (lsei = lsei_start_function_in_partition (encoder);
!lsei_end_p (lsei); lsei_next_function_in_partition (lsei))
--- gcc/tree-streamer.h.jj  2015-02-16 22:18:33.222702266 +0100
+++ gcc/tree-streamer.h 2015-02-16 22:19:20.843917791 +0100
@@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.
 
 #include streamer-hooks.h
 #include lto-streamer.h
+#include data-streamer.h
 #include hash-map.h
 
 /* Cache of pickled nodes.  Used to avoid writing the same node more
@@ -91,6 +92,7 @@ void streamer_write_integer_cst (struct
 void streamer_write_builtin (struct output_block *, tree);
 
 /* In tree-streamer.c.  */
+extern unsigned char streamer_mode_table[1  8];
 void streamer_check_handled_ts_structures (void);
 bool streamer_tree_cache_insert (struct streamer_tree_cache_d *, tree,
 hashval_t, unsigned *);
@@ -119,5 +121,19 @@ streamer_tree_cache_get_hash (struct str
   

Re: nvptx offloading patches [3/n], RFD

2015-02-18 Thread Thomas Schwinge
Hi Jakub!

(Will respond to your other questions later.)


On Wed, 18 Feb 2015 12:34:38 +0100, Jakub Jelinek ja...@redhat.com wrote:
 On Wed, Feb 18, 2015 at 10:12:19AM +0100, Thomas Schwinge wrote:
  Do you literally have »nvptx-newlib symlinked into the gcc tree as
  newlib«?  If yes, then that should explain the problem: as I wrote in
  http://news.gmane.org/find-root.php?message_id=%3C87egq8mir1.fsf%40schwinge.name%3E,
  you need to »add a symbolic link to nvptx-newlib's newlib directory to
  the directory containing the GCC sources«, so not link [GCC]/newlib -
  [newlib-nvptx], but [GCC]/newlib - [newlib-nvptx]/newlib.  Does that
  resolve the issue?

(It did.)  Can you suggest a better wording, to make this more clear in
the documentation?


 BTW, --with-cuda-driver-{include,lib} are apparently not documented in
 gcc/doc/ (--with-cuda-driver neither, but can't use that, as lib is
 /usr/local/cuda-6.5/lib64 in my case), and isn't documented on wiki/Offloading
 either.

Thanks for reporting; will fix that.


 ../configure --target=nvptx-none 
 --enable-as-accelerator-for=x86_64-pc-linux-gnu 
 --with-build-time-tools=/usr/src/gcc/objnvptxinst/usr/local/nvptx-none/bin 
 --disable-sjlj-exceptions --enable-newlib-io-long-long
 make; make DESTDIR=/usr/src/gcc/objnvptxinst install
 
 and
 
 ../configure --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu 
 --target=x86_64-pc-linux-gnu 
 --enable-offload-targets=nvptx-none=/usr/src/gcc/objnvptxinst 
 --disable-bootstrap --with-cuda-driver-include=/usr/local/cuda-6.5/include 
 --with-cuda-driver-lib=/usr/local/cuda-6.5/lib64
 make; make DESTDIR=/usr/src/gcc/objnvptxinst install
 
 compilers now build

That looks very similar to what I'm using.  I currently install into
separate prefixes/DESTDIRS, because I have not yet verified that there
is no overlap in the installed files.


 offloading fails:
 
 /usr/src/gcc/objnvptxinst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/5.0.0//accel/nvptx-none/mkoffload
  @/tmp/cce9PdmR
 x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: language lto not recognized
 x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: language lto not recognized
 mkoffload: fatal error: 
 /usr/src/gcc/objnvptxinst/usr/local/bin/x86_64-pc-linux-gnu-accel-nvptx-none-gcc
  returned 1 exit status
 compilation terminated.
 lto-wrapper: fatal error: 
 /usr/src/gcc/objnvptxinst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/5.0.0//accel/nvptx-none/mkoffload
  returned 1 exit status
 compilation terminated.
 /usr/bin/ld: lto-wrapper failed
 collect2: error: ld returned 1 exit status
 
 Is --enable-languages=c,c++,fortran,lto required when configuring the
 offload compiler?  It isn't required for intelmic.

Yes, exactly.  I assume the reason is that x86_64-intelmicemul-linux-gnu
defaults to supporting LTO, and due to this also defaults to building the
LTO front end.  I'll enhance the nvptx offloading documentation
accordingly.  Maybe we should add some magic to build the LTO front end
if --enable-as-accelerator-for=[...] has been specified?


Note that I recently added another prerequisite patch for nvptx
offloading to https://gcc.gnu.org/wiki/Offloading#nvptx_Offloading:
http://news.gmane.org/find-root.php?message_id=%3C546CF508.9010807%40codesourcery.com%3E.
If that is not applied, you'll get run-time errors because in
libgomp/plugin/plugin-nvptx.c:GOMP_OFFLOAD_get_table, cuModuleGetFunction
can't find main$_omp_fn$0 and similar symbols.


Grüße,
 Thomas


pgpfvZAJm6VWf.pgp
Description: PGP signature


Re: nvptx offloading patches [3/n], RFD

2015-02-18 Thread Jakub Jelinek
On Wed, Feb 18, 2015 at 01:09:53PM +0100, Thomas Schwinge wrote:
 On Wed, 18 Feb 2015 12:34:38 +0100, Jakub Jelinek ja...@redhat.com wrote:
  On Wed, Feb 18, 2015 at 10:12:19AM +0100, Thomas Schwinge wrote:
   Do you literally have »nvptx-newlib symlinked into the gcc tree as
   newlib«?  If yes, then that should explain the problem: as I wrote in
   http://news.gmane.org/find-root.php?message_id=%3C87egq8mir1.fsf%40schwinge.name%3E,
   you need to »add a symbolic link to nvptx-newlib's newlib directory to
   the directory containing the GCC sources«, so not link [GCC]/newlib -
   [newlib-nvptx], but [GCC]/newlib - [newlib-nvptx]/newlib.  Does that
   resolve the issue?
 
 (It did.)  Can you suggest a better wording, to make this more clear in
 the documentation?

Your wording is fine, but should be listed on wiki/Offloading and
doc/install.texi perhaps too?

  offloading fails:
  
  /usr/src/gcc/objnvptxinst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/5.0.0//accel/nvptx-none/mkoffload
   @/tmp/cce9PdmR
  x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: language lto not recognized
  x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: language lto not recognized
  mkoffload: fatal error: 
  /usr/src/gcc/objnvptxinst/usr/local/bin/x86_64-pc-linux-gnu-accel-nvptx-none-gcc
   returned 1 exit status
  compilation terminated.
  lto-wrapper: fatal error: 
  /usr/src/gcc/objnvptxinst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/5.0.0//accel/nvptx-none/mkoffload
   returned 1 exit status
  compilation terminated.
  /usr/bin/ld: lto-wrapper failed
  collect2: error: ld returned 1 exit status
  
  Is --enable-languages=c,c++,fortran,lto required when configuring the
  offload compiler?  It isn't required for intelmic.
 
 Yes, exactly.  I assume the reason is that x86_64-intelmicemul-linux-gnu
 defaults to supporting LTO, and due to this also defaults to building the
 LTO front end.  I'll enhance the nvptx offloading documentation
 accordingly.  Maybe we should add some magic to build the LTO front end
 if --enable-as-accelerator-for=[...] has been specified?

Toplevel configure.ac has:
  # If LTO is enabled, add the LTO front end.
  if test $enable_lto = yes ; then
case ,${enable_languages}, in
  *,lto,*) ;;
  *) enable_languages=${enable_languages},lto ;;
esac
if test ${build_lto_plugin} = yes ; then
  configdirs=$configdirs lto-plugin
fi
  fi
so IMHO we want similar snippet for the --enable-as-accelerator-for= case,
perhaps right below this one.  Not building lto FE for the accelerator
compilers make them completely useless, thus I think we really want to do
that automatically.

 Note that I recently added another prerequisite patch for nvptx
 offloading to https://gcc.gnu.org/wiki/Offloading#nvptx_Offloading:
 http://news.gmane.org/find-root.php?message_id=%3C546CF508.9010807%40codesourcery.com%3E.
 If that is not applied, you'll get run-time errors because in
 libgomp/plugin/plugin-nvptx.c:GOMP_OFFLOAD_get_table, cuModuleGetFunction
 can't find main$_omp_fn$0 and similar symbols.

Can you adjust that to add a cgraph flag alongside of the offloadable
instead and use that instead of the attribute?

Jakub


Re: nvptx offloading patches [3/n], RFD

2015-02-17 Thread Jakub Jelinek
On Tue, Feb 17, 2015 at 04:21:06PM +, Joseph Myers wrote:
 On Tue, 17 Feb 2015, Jakub Jelinek wrote:
 
  Third attempt failed with:
  ../../../libgcc/config/nvptx/realloc.c:24:20: fatal error: stdlib.h: No 
  such file or directory
  compilation terminated.
  ../../../libgcc/static-object.mk:17: recipe for target 'realloc.o' failed
  make[2]: *** [realloc.o] Error 1
  make[2]: *** Waiting for unfinished jobs
  make[2]: Leaving directory '/usr/src/gcc/objnvptx/nvptx-none/libgcc'
  I have nvptx-newlib symlinked into the gcc tree as newlib, so I expected it
  would be built in-tree, is that not the case (at least wiki/Offloading
  mentions that).  Or is it just that libgcc can't really have dependencies on
  newlib headers as newlib is built after libgcc?
 
 I've committed this patch to fix this last issue (the header dependence, 
 that is; I don't know about the in-tree build).

Thanks, sure, libgcc now builds fine, the in-tree build fails:
configure:4261: checking for C compiler default output file name
configure:4283: /usr/src/gcc/objnvptx/./gcc/xgcc -B/usr/src/gcc/objnvptx/./gcc/ 
-nostdinc -B/usr/src/gcc/objnvptx/nvptx-none/newlib/ -isystem 
/usr/src/gcc/objnvptx/nvptx-none/newlib/targ-include -isystem 
/usr/src/gcc/newlib/libc/include -B/usr/local/nvptx-none/bin/ 
-B/usr/local/nvptx-none/lib/ -isystem /usr/local/nvptx-none/include -isystem 
/usr/local/nvptx-none/sys-include-g -O2   conftest.c  5
error opening libc.a
collect2: error: ld returned 1 exit status
very early during in-tree newlib configure.

Jakub


Re: nvptx offloading patches [3/n], RFD

2015-02-17 Thread Joseph Myers
On Tue, 17 Feb 2015, Jakub Jelinek wrote:

 Third attempt failed with:
 ../../../libgcc/config/nvptx/realloc.c:24:20: fatal error: stdlib.h: No such 
 file or directory
 compilation terminated.
 ../../../libgcc/static-object.mk:17: recipe for target 'realloc.o' failed
 make[2]: *** [realloc.o] Error 1
 make[2]: *** Waiting for unfinished jobs
 make[2]: Leaving directory '/usr/src/gcc/objnvptx/nvptx-none/libgcc'
 I have nvptx-newlib symlinked into the gcc tree as newlib, so I expected it
 would be built in-tree, is that not the case (at least wiki/Offloading
 mentions that).  Or is it just that libgcc can't really have dependencies on
 newlib headers as newlib is built after libgcc?

I've committed this patch to fix this last issue (the header dependence, 
that is; I don't know about the in-tree build).

2015-02-17  Joseph Myers  jos...@codesourcery.com

* config/nvptx/realloc.c: Include stddef.h instead of stdlib.h
and string.h.
(__nvptx_realloc): Call __builtin_memcpy instead of memcpy.

Index: libgcc/config/nvptx/realloc.c
===
--- libgcc/config/nvptx/realloc.c   (revision 220763)
+++ libgcc/config/nvptx/realloc.c   (working copy)
@@ -21,8 +21,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
http://www.gnu.org/licenses/.  */
 
-#include stdlib.h
-#include string.h
+#include stddef.h
 #include nvptx-malloc.h
 
 void *
@@ -44,7 +43,7 @@ __nvptx_realloc (void *ptr, size_t newsz)
   oldsz = *sp;
 }
   if (oldsz != 0)
-memcpy (newptr, ptr, oldsz  newsz ? newsz : oldsz);
+__builtin_memcpy (newptr, ptr, oldsz  newsz ? newsz : oldsz);
 
   __nvptx_free (ptr);
   return newptr;

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: nvptx offloading patches [3/n], RFD

2015-02-17 Thread Jakub Jelinek
On Tue, Feb 17, 2015 at 04:32:06PM +0300, Ilya Verbin wrote:
  If we don't try to write .gnu.offload_lto_* again, I think following patch
  with additionally not calling lto_write_mode_table for !lto_stream_offload_p
  and not calling lto_input_mode_table for !ACCEL_COMPILER - instead build
  a single shared identity table - might actually work.
  
  Thoughts on this?
 
 Probably the ACCEL_COMPILER in WPA mode (flag_wpa) can read .gnu.offload_lto_*
 sections and produce temporary partitions with .gnu.lto_* sections.  And the
 ACCEL_COMPILER in LTRANS mode (flag_ltrans) will read .gnu.lto_* sections?

FYI, I have tested my mode_table patch with the intelmic emul offloading and
saw no regressions.

Then I went over and wanted to try nvptx offloading, but running into
various issues.

I had two patches from Bernd (already approved, why they haven't been
installed?) applied, had to tweak the first one so that it applies,
then my mode_table patch.
I've built nvptx-tools and configured:
../configure --target=nvptx-none 
--enable-as-accelerator-for=x86_64-pc-linux-gnu 
--with-build-time-tools=/usr/src/gcc/objnvptxinst/usr/local/nvptx-none/bin 
--disable-sjlj-exceptions --enable-newlib-io-long-long
make -j16
This failed miserably, because of missing mkoffload.o dependencies, patch 
attached
(ok for trunk?; it does what intelmic mkoffload.o does; I've tried to add
| $(generated_files) dependency instead, but that somehow didn't work).

The second attempt with that fixed died because for some reason
nvptx-none-as wants to verify by default using ptxas.  Can that be made
configurable?  E.g. for building nvptx offloading in distros, I believe
due to the proprietary cuda stuff it will be better if everything can be
built without the proprietary stuff and only used when actually running it.
E.g. could the verification be done by default only if ptxas is found in
$PATH and not if it isn't found?

Third attempt failed with:
../../../libgcc/config/nvptx/realloc.c:24:20: fatal error: stdlib.h: No such 
file or directory
compilation terminated.
../../../libgcc/static-object.mk:17: recipe for target 'realloc.o' failed
make[2]: *** [realloc.o] Error 1
make[2]: *** Waiting for unfinished jobs
make[2]: Leaving directory '/usr/src/gcc/objnvptx/nvptx-none/libgcc'
I have nvptx-newlib symlinked into the gcc tree as newlib, so I expected it
would be built in-tree, is that not the case (at least wiki/Offloading
mentions that).  Or is it just that libgcc can't really have dependencies on
newlib headers as newlib is built after libgcc?

Jakub
* cgraph.h (clone_function_name_1): Declare.
* cgraphclones.c (clone_function_name_1): New function.
(clone_function_name): Use it.
* lto-partition.c: Include stringpool.h.
(must_not_rename, maybe_rewrite_identifier,
validize_symbol_for_target): New static functions.
(privatize_symbol_name): Use must_not_rename.
(promote_symbol): Call validize_symbol_for_target.
(lto_promote_cross_file_statics): Likewise.
(lto_promote_statics_nonwpa): Likewise.

--- gcc/cgraph.h.jj 2015-02-16 11:19:03.474984223 +0100
+++ gcc/cgraph.h2015-02-17 13:54:00.413964133 +0100
@@ -2206,6 +2206,7 @@ basic_block init_lowered_empty_function
 
 /* In cgraphclones.c  */
 
+tree clone_function_name_1 (const char *, const char *);
 tree clone_function_name (tree decl, const char *);
 
 void tree_function_versioning (tree, tree, vecipa_replace_map *, va_gc *,
--- gcc/cgraphclones.c.jj   2015-02-17 10:07:53.208582797 +0100
+++ gcc/cgraphclones.c  2015-02-17 13:54:00.413964133 +0100
@@ -533,19 +533,19 @@ cgraph_node::create_clone (tree decl, gc
   return new_node;
 }
 
-/* Return a new assembler name for a clone of DECL with SUFFIX.  */
-
 static GTY(()) unsigned int clone_fn_id_num;
 
+/* Return a new assembler name for a clone with SUFFIX of a decl named
+   NAME.  */
+
 tree
-clone_function_name (tree decl, const char *suffix)
+clone_function_name_1 (const char *name, const char *suffix)
 {
-  tree name = DECL_ASSEMBLER_NAME (decl);
-  size_t len = IDENTIFIER_LENGTH (name);
+  size_t len = strlen (name);
   char *tmp_name, *prefix;
 
   prefix = XALLOCAVEC (char, len + strlen (suffix) + 2);
-  memcpy (prefix, IDENTIFIER_POINTER (name), len);
+  memcpy (prefix, name, len);
   strcpy (prefix + len + 1, suffix);
 #ifndef NO_DOT_IN_LABEL
   prefix[len] = '.';
@@ -558,6 +558,16 @@ clone_function_name (tree decl, const ch
   return get_identifier (tmp_name);
 }
 
+/* Return a new assembler name for a clone of DECL with SUFFIX.  */
+
+tree
+clone_function_name (tree decl, const char *suffix)
+{
+  tree name = DECL_ASSEMBLER_NAME (decl);
+  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix);
+}
+
+
 /* Create callgraph node clone with new declaration.  The actual body will
be copied later at compilation stage.
 
--- gcc/lto/lto-partition.c.jj  2015-01-15 14:05:08.706092596 +0100
+++ gcc/lto/lto-partition.c 

Re: nvptx offloading patches [3/n], RFD

2015-02-17 Thread Ilya Verbin
On Mon, Feb 16, 2015 at 22:08:12 +0100, Jakub Jelinek wrote:
 Anyway, the question is if for offloading we use wpa stage at all these days
 or not at all, if there is a way for ACCEL_COMPILER to differentiate
 somehow between LTO sections written by the host compiler and LTO sections
 perhaps created by the offloading compiler when trying to LTO the thing (if
 it does it at all).  Because obviously the host compiler written LTO
 (in .gnu.offload_lto_*) would need the machine modes translated, while
 LTO streamed already by the ACCEL_COMPILER (if any) generally would already
 use the offloading target machine modes and therefore should be treated as
 native lto (.gnu.lto_*). 

Currently both intelmic and nvptx offloading compilers are executed in
non-partitioned LTO mode.  I don't know whether we need to support WHOPR
(WPA+LTRANS) mode.  Maybe it would be useful for programs with large number of
target regions?  But I think this is not needed for GCC 5.
 
 If we don't try to write .gnu.offload_lto_* again, I think following patch
 with additionally not calling lto_write_mode_table for !lto_stream_offload_p
 and not calling lto_input_mode_table for !ACCEL_COMPILER - instead build
 a single shared identity table - might actually work.
 
 Thoughts on this?

Probably the ACCEL_COMPILER in WPA mode (flag_wpa) can read .gnu.offload_lto_*
sections and produce temporary partitions with .gnu.lto_* sections.  And the
ACCEL_COMPILER in LTRANS mode (flag_ltrans) will read .gnu.lto_* sections?

  -- Ilya


Re: nvptx offloading patches [3/n], RFD

2015-02-17 Thread Richard Biener
On Mon, Feb 16, 2015 at 10:43 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Mon, Feb 16, 2015 at 10:35:30PM +0100, Richard Biener wrote:
 Seeing the real format string you introduce I wonder if identifying modes
 by their names wouldn't work in 99% of all cases (apart from PSImode
 maybe).

 There are various corner cases.  Plus of course sometimes insignificant, but
 sometimes very significant, floating mode changes.  SFmode on one target
 might be completely different from another target.

But we can't deal with arbitrary target differences anyway - otherwise
we have generated wrong code already.

 Also for most cases we can construct the machine mode from the type.  Or
 where that is not possible stream the extra info that is necessary
 instead.

 I thought we've discussed that already on IRC.  E.g. decimal modes are
 identified only by mode and nothing else, and it doesn't look like it
 can be easily derived from types in many cases (spent quite some time on
 that).

Sure, still modes and types have quite some overlap in information
so we might be able to do more compact streaming (and at the same
time not rely on the machine-mode enum).  The machine-modes
of course are very compact to stream (they are basically a common set
of all possible types), and your mapping introduces kind of a cache
for common type properties.

I know that Honza wanted to make trees slimmer by taking into account
more (redundant) information from the modes associated with trees.

I'm just looking for a way to make this less of a hack (and the LTO IL
less target dependent).  Not for GCC 5 for which something like your
patch is probably ok, but for the future.

 Overall feels like a hack BTW :)  can't we assign machine mode enum IDs in
 a target independent way?  I mean, it doesn't have to be densely
 allocated?

 We iterate over modes, we have tons of tables indexed by modes, so if we
 introduce gaps, we'll make the compiler bigger and slower.
 If this is limited to the offloading path, like in the attached updated
 patch, the overhead for native LTO should be not measurable.

Sure.

Thanks,
Richard.

 --- gcc/passes.c.jj 2015-02-16 22:18:33.219702315 +0100
 +++ gcc/passes.c2015-02-16 22:19:20.842917807 +0100
 @@ -2460,6 +2460,7 @@ ipa_write_summaries_1 (lto_symtab_encode
struct lto_out_decl_state *state = lto_new_out_decl_state ();
state-symtab_node_encoder = encoder;

 +  lto_output_init_mode_table ();
lto_push_out_decl_state (state);

gcc_assert (!flag_wpa);
 @@ -2581,6 +2582,7 @@ ipa_write_optimization_summaries (lto_sy
lto_symtab_encoder_iterator lsei;
state-symtab_node_encoder = encoder;

 +  lto_output_init_mode_table ();
lto_push_out_decl_state (state);
for (lsei = lsei_start_function_in_partition (encoder);
 !lsei_end_p (lsei); lsei_next_function_in_partition (lsei))
 --- gcc/tree-streamer.h.jj  2015-02-16 22:18:33.222702266 +0100
 +++ gcc/tree-streamer.h 2015-02-16 22:19:20.843917791 +0100
 @@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.

  #include streamer-hooks.h
  #include lto-streamer.h
 +#include data-streamer.h
  #include hash-map.h

  /* Cache of pickled nodes.  Used to avoid writing the same node more
 @@ -91,6 +92,7 @@ void streamer_write_integer_cst (struct
  void streamer_write_builtin (struct output_block *, tree);

  /* In tree-streamer.c.  */
 +extern unsigned char streamer_mode_table[1  8];
  void streamer_check_handled_ts_structures (void);
  bool streamer_tree_cache_insert (struct streamer_tree_cache_d *, tree,
  hashval_t, unsigned *);
 @@ -119,5 +121,19 @@ streamer_tree_cache_get_hash (struct str
return cache-hashes[ix];
  }

 +static inline void
 +bp_pack_machine_mode (struct bitpack_d *bp, machine_mode mode)
 +{
 +  streamer_mode_table[mode] = 1;
 +  bp_pack_enum (bp, machine_mode, 1  8, mode);
 +}
 +
 +static inline machine_mode
 +bp_unpack_machine_mode (struct bitpack_d *bp)
 +{
 +  return (machine_mode)
 +  ((struct lto_input_block *)
 +   bp-stream)-mode_table[bp_unpack_enum (bp, machine_mode, 1  
 8)];
 +}

  #endif  /* GCC_TREE_STREAMER_H  */
 --- gcc/lto-streamer-out.c.jj   2015-02-16 22:18:33.204702562 +0100
 +++ gcc/lto-streamer-out.c  2015-02-16 22:20:06.659163066 +0100
 @@ -2642,6 +2642,96 @@ produce_symtab (struct output_block *ob)
  }


 +/* Init the streamer_mode_table for output, where we collect info on what
 +   machine_mode values have been streamed.  */
 +void
 +lto_output_init_mode_table (void)
 +{
 +  memset (streamer_mode_table, '\0', MAX_MACHINE_MODE);
 +}
 +
 +
 +/* Write the mode table.  */
 +static void
 +lto_write_mode_table (void)
 +{
 +  struct output_block *ob;
 +  ob = create_output_block (LTO_section_mode_table);
 +  bitpack_d bp = bitpack_create (ob-main_stream);
 +
 +  /* Ensure that for GET_MODE_INNER (m) != VOIDmode we have
 + also the inner mode marked.  */
 +  for (int i = 0; i  (int) MAX_MACHINE_MODE; i++)
 +if 

Re: nvptx offloading patches [3/n], RFD

2015-02-16 Thread Richard Biener
On February 16, 2015 10:08:12 PM CET, Jakub Jelinek ja...@redhat.com wrote:
Hi!

On Mon, Feb 09, 2015 at 11:20:00AM +0100, Richard Biener wrote:
 I think (also communicated that on IRC) we should instead try not
streaming
 machine-modes at all but generating them at stream-in time via
layout_type
 or layout_decl.

Here is a WIP prototype for being able to stream a machine mode
description
table and streaming it back in.
In the end, I'd like to stream this out only for lto_stream_offload_p
and
stream it in only for ACCEL_COMPILER reading in when available, but
wanted
to see what it does even for native LTO.
For that it doesn't work very well, because it seems that wpa phase
doesn't stream in some sections and stream them out again, but instead
somehow copies them directly to the output object, so the mode table
isn't aware of the modes used in there that were bypassed this way.

Anyway, the question is if for offloading we use wpa stage at all these
days
or not at all, if there is a way for ACCEL_COMPILER to differentiate
somehow between LTO sections written by the host compiler and LTO
sections
perhaps created by the offloading compiler when trying to LTO the thing
(if
it does it at all).  Because obviously the host compiler written LTO
(in .gnu.offload_lto_*) would need the machine modes translated, while
LTO streamed already by the ACCEL_COMPILER (if any) generally would
already
use the offloading target machine modes and therefore should be treated
as
native lto (.gnu.lto_*). 

If we don't try to write .gnu.offload_lto_* again, I think following
patch
with additionally not calling lto_write_mode_table for
!lto_stream_offload_p
and not calling lto_input_mode_table for !ACCEL_COMPILER - instead
build
a single shared identity table - might actually work.

Thoughts on this?

Seeing the real format string you introduce I wonder if identifying modes by 
their names wouldn't work in 99% of all cases (apart from PSImode maybe).

Also for most cases we can construct the machine mode from the type.  Or where 
that is not possible stream the extra info that is necessary instead.

Overall feels like a hack BTW :)  can't we assign machine mode enum IDs in a 
target independent way?  I mean, it doesn't have to be densely allocated?

Richard.

Bernd/Thomas, do you plan to commit the other approved patches soon?

--- gcc/passes.c.jj2015-02-16 20:14:09.477345693 +0100
+++ gcc/passes.c   2015-02-16 20:26:23.659299189 +0100
@@ -2460,6 +2460,7 @@ ipa_write_summaries_1 (lto_symtab_encode
   struct lto_out_decl_state *state = lto_new_out_decl_state ();
   state-symtab_node_encoder = encoder;
 
+  lto_output_init_mode_table ();
   lto_push_out_decl_state (state);
 
   gcc_assert (!flag_wpa);
@@ -2581,6 +2582,7 @@ ipa_write_optimization_summaries (lto_sy
   lto_symtab_encoder_iterator lsei;
   state-symtab_node_encoder = encoder;
 
+  lto_output_init_mode_table ();
   lto_push_out_decl_state (state);
   for (lsei = lsei_start_function_in_partition (encoder);
!lsei_end_p (lsei); lsei_next_function_in_partition (lsei))
--- gcc/tree-streamer.h.jj 2015-02-16 20:14:09.446346202 +0100
+++ gcc/tree-streamer.h2015-02-16 21:14:50.701615850 +0100
@@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.
 
 #include streamer-hooks.h
 #include lto-streamer.h
+#include data-streamer.h
 #include hash-map.h
 
 /* Cache of pickled nodes.  Used to avoid writing the same node more
@@ -91,6 +92,7 @@ void streamer_write_integer_cst (struct
 void streamer_write_builtin (struct output_block *, tree);
 
 /* In tree-streamer.c.  */
+extern unsigned char streamer_mode_table[1  8];
 void streamer_check_handled_ts_structures (void);
 bool streamer_tree_cache_insert (struct streamer_tree_cache_d *, tree,
hashval_t, unsigned *);
@@ -119,5 +121,19 @@ streamer_tree_cache_get_hash (struct str
   return cache-hashes[ix];
 }
 
+static inline void
+bp_pack_machine_mode (struct bitpack_d *bp, machine_mode mode)
+{
+  streamer_mode_table[mode] = 1;
+  bp_pack_enum (bp, machine_mode, 1  8, mode);
+}
+
+static inline machine_mode
+bp_unpack_machine_mode (struct bitpack_d *bp)
+{
+  return (machine_mode)
+ ((struct lto_input_block *)
+  bp-stream)-mode_table[bp_unpack_enum (bp, machine_mode, 1 
8)];
+}
 
 #endif  /* GCC_TREE_STREAMER_H  */
--- gcc/lto-streamer-out.c.jj  2015-02-16 20:14:09.046352765 +0100
+++ gcc/lto-streamer-out.c 2015-02-16 20:26:23.665299091 +0100
@@ -2642,6 +2642,96 @@ produce_symtab (struct output_block *ob)
 }
 
 
+/* Init the streamer_mode_table for output, where we collect info on
what
+   machine_mode values have been streamed.  */
+void
+lto_output_init_mode_table (void)
+{
+  memset (streamer_mode_table, '\0', MAX_MACHINE_MODE);
+}
+
+
+/* Write the mode table.  */
+static void
+lto_write_mode_table (void)
+{
+  struct output_block *ob;
+  ob = create_output_block (LTO_section_mode_table);
+  bitpack_d bp = bitpack_create (ob-main_stream);
+
+  /* Ensure that 

Re: nvptx offloading patches [3/n], RFD

2015-02-16 Thread Jakub Jelinek
On Mon, Feb 16, 2015 at 10:35:30PM +0100, Richard Biener wrote:
 Seeing the real format string you introduce I wonder if identifying modes
 by their names wouldn't work in 99% of all cases (apart from PSImode
 maybe).

There are various corner cases.  Plus of course sometimes insignificant, but
sometimes very significant, floating mode changes.  SFmode on one target
might be completely different from another target.

 Also for most cases we can construct the machine mode from the type.  Or
 where that is not possible stream the extra info that is necessary
 instead.

I thought we've discussed that already on IRC.  E.g. decimal modes are
identified only by mode and nothing else, and it doesn't look like it
can be easily derived from types in many cases (spent quite some time on
that).

 Overall feels like a hack BTW :)  can't we assign machine mode enum IDs in
 a target independent way?  I mean, it doesn't have to be densely
 allocated?

We iterate over modes, we have tons of tables indexed by modes, so if we
introduce gaps, we'll make the compiler bigger and slower.
If this is limited to the offloading path, like in the attached updated
patch, the overhead for native LTO should be not measurable.

--- gcc/passes.c.jj 2015-02-16 22:18:33.219702315 +0100
+++ gcc/passes.c2015-02-16 22:19:20.842917807 +0100
@@ -2460,6 +2460,7 @@ ipa_write_summaries_1 (lto_symtab_encode
   struct lto_out_decl_state *state = lto_new_out_decl_state ();
   state-symtab_node_encoder = encoder;
 
+  lto_output_init_mode_table ();
   lto_push_out_decl_state (state);
 
   gcc_assert (!flag_wpa);
@@ -2581,6 +2582,7 @@ ipa_write_optimization_summaries (lto_sy
   lto_symtab_encoder_iterator lsei;
   state-symtab_node_encoder = encoder;
 
+  lto_output_init_mode_table ();
   lto_push_out_decl_state (state);
   for (lsei = lsei_start_function_in_partition (encoder);
!lsei_end_p (lsei); lsei_next_function_in_partition (lsei))
--- gcc/tree-streamer.h.jj  2015-02-16 22:18:33.222702266 +0100
+++ gcc/tree-streamer.h 2015-02-16 22:19:20.843917791 +0100
@@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.
 
 #include streamer-hooks.h
 #include lto-streamer.h
+#include data-streamer.h
 #include hash-map.h
 
 /* Cache of pickled nodes.  Used to avoid writing the same node more
@@ -91,6 +92,7 @@ void streamer_write_integer_cst (struct
 void streamer_write_builtin (struct output_block *, tree);
 
 /* In tree-streamer.c.  */
+extern unsigned char streamer_mode_table[1  8];
 void streamer_check_handled_ts_structures (void);
 bool streamer_tree_cache_insert (struct streamer_tree_cache_d *, tree,
 hashval_t, unsigned *);
@@ -119,5 +121,19 @@ streamer_tree_cache_get_hash (struct str
   return cache-hashes[ix];
 }
 
+static inline void
+bp_pack_machine_mode (struct bitpack_d *bp, machine_mode mode)
+{
+  streamer_mode_table[mode] = 1;
+  bp_pack_enum (bp, machine_mode, 1  8, mode);
+}
+
+static inline machine_mode
+bp_unpack_machine_mode (struct bitpack_d *bp)
+{
+  return (machine_mode)
+  ((struct lto_input_block *)
+   bp-stream)-mode_table[bp_unpack_enum (bp, machine_mode, 1  8)];
+}
 
 #endif  /* GCC_TREE_STREAMER_H  */
--- gcc/lto-streamer-out.c.jj   2015-02-16 22:18:33.204702562 +0100
+++ gcc/lto-streamer-out.c  2015-02-16 22:20:06.659163066 +0100
@@ -2642,6 +2642,96 @@ produce_symtab (struct output_block *ob)
 }
 
 
+/* Init the streamer_mode_table for output, where we collect info on what
+   machine_mode values have been streamed.  */
+void
+lto_output_init_mode_table (void)
+{
+  memset (streamer_mode_table, '\0', MAX_MACHINE_MODE);
+}
+
+
+/* Write the mode table.  */
+static void
+lto_write_mode_table (void)
+{
+  struct output_block *ob;
+  ob = create_output_block (LTO_section_mode_table);
+  bitpack_d bp = bitpack_create (ob-main_stream);
+
+  /* Ensure that for GET_MODE_INNER (m) != VOIDmode we have
+ also the inner mode marked.  */
+  for (int i = 0; i  (int) MAX_MACHINE_MODE; i++)
+if (streamer_mode_table[i])
+  {
+   machine_mode m = (machine_mode) i;
+   if (GET_MODE_INNER (m) != VOIDmode)
+ streamer_mode_table[(int) GET_MODE_INNER (m)] = 1;
+  }
+  /* First stream modes that have GET_MODE_INNER (m) == VOIDmode,
+ so that we can refer to them afterwards.  */
+  for (int pass = 0; pass  2; pass++)
+for (int i = 0; i  (int) MAX_MACHINE_MODE; i++)
+  if (streamer_mode_table[i]  i != (int) VOIDmode  i != (int) BLKmode)
+   {
+ machine_mode m = (machine_mode) i;
+ if ((GET_MODE_INNER (m) == VOIDmode) ^ (pass == 0))
+   continue;
+ bp_pack_value (bp, m, 8);
+ bp_pack_enum (bp, mode_class, MAX_MODE_CLASS, GET_MODE_CLASS (m));
+ bp_pack_value (bp, GET_MODE_SIZE (m), 8);
+ bp_pack_value (bp, GET_MODE_PRECISION (m), 16);
+ bp_pack_value (bp, GET_MODE_INNER (m), 8);
+ bp_pack_value (bp, GET_MODE_NUNITS (m), 8);
+ switch (GET_MODE_CLASS (m))

Re: nvptx offloading patches [3/n], RFD

2015-02-16 Thread Jakub Jelinek
Hi!

On Mon, Feb 09, 2015 at 11:20:00AM +0100, Richard Biener wrote:
 I think (also communicated that on IRC) we should instead try not streaming
 machine-modes at all but generating them at stream-in time via layout_type
 or layout_decl.

Here is a WIP prototype for being able to stream a machine mode description
table and streaming it back in.
In the end, I'd like to stream this out only for lto_stream_offload_p and
stream it in only for ACCEL_COMPILER reading in when available, but wanted
to see what it does even for native LTO.
For that it doesn't work very well, because it seems that wpa phase
doesn't stream in some sections and stream them out again, but instead
somehow copies them directly to the output object, so the mode table
isn't aware of the modes used in there that were bypassed this way.

Anyway, the question is if for offloading we use wpa stage at all these days
or not at all, if there is a way for ACCEL_COMPILER to differentiate
somehow between LTO sections written by the host compiler and LTO sections
perhaps created by the offloading compiler when trying to LTO the thing (if
it does it at all).  Because obviously the host compiler written LTO
(in .gnu.offload_lto_*) would need the machine modes translated, while
LTO streamed already by the ACCEL_COMPILER (if any) generally would already
use the offloading target machine modes and therefore should be treated as
native lto (.gnu.lto_*). 

If we don't try to write .gnu.offload_lto_* again, I think following patch
with additionally not calling lto_write_mode_table for !lto_stream_offload_p
and not calling lto_input_mode_table for !ACCEL_COMPILER - instead build
a single shared identity table - might actually work.

Thoughts on this?

Bernd/Thomas, do you plan to commit the other approved patches soon?

--- gcc/passes.c.jj 2015-02-16 20:14:09.477345693 +0100
+++ gcc/passes.c2015-02-16 20:26:23.659299189 +0100
@@ -2460,6 +2460,7 @@ ipa_write_summaries_1 (lto_symtab_encode
   struct lto_out_decl_state *state = lto_new_out_decl_state ();
   state-symtab_node_encoder = encoder;
 
+  lto_output_init_mode_table ();
   lto_push_out_decl_state (state);
 
   gcc_assert (!flag_wpa);
@@ -2581,6 +2582,7 @@ ipa_write_optimization_summaries (lto_sy
   lto_symtab_encoder_iterator lsei;
   state-symtab_node_encoder = encoder;
 
+  lto_output_init_mode_table ();
   lto_push_out_decl_state (state);
   for (lsei = lsei_start_function_in_partition (encoder);
!lsei_end_p (lsei); lsei_next_function_in_partition (lsei))
--- gcc/tree-streamer.h.jj  2015-02-16 20:14:09.446346202 +0100
+++ gcc/tree-streamer.h 2015-02-16 21:14:50.701615850 +0100
@@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.
 
 #include streamer-hooks.h
 #include lto-streamer.h
+#include data-streamer.h
 #include hash-map.h
 
 /* Cache of pickled nodes.  Used to avoid writing the same node more
@@ -91,6 +92,7 @@ void streamer_write_integer_cst (struct
 void streamer_write_builtin (struct output_block *, tree);
 
 /* In tree-streamer.c.  */
+extern unsigned char streamer_mode_table[1  8];
 void streamer_check_handled_ts_structures (void);
 bool streamer_tree_cache_insert (struct streamer_tree_cache_d *, tree,
 hashval_t, unsigned *);
@@ -119,5 +121,19 @@ streamer_tree_cache_get_hash (struct str
   return cache-hashes[ix];
 }
 
+static inline void
+bp_pack_machine_mode (struct bitpack_d *bp, machine_mode mode)
+{
+  streamer_mode_table[mode] = 1;
+  bp_pack_enum (bp, machine_mode, 1  8, mode);
+}
+
+static inline machine_mode
+bp_unpack_machine_mode (struct bitpack_d *bp)
+{
+  return (machine_mode)
+  ((struct lto_input_block *)
+   bp-stream)-mode_table[bp_unpack_enum (bp, machine_mode, 1  8)];
+}
 
 #endif  /* GCC_TREE_STREAMER_H  */
--- gcc/lto-streamer-out.c.jj   2015-02-16 20:14:09.046352765 +0100
+++ gcc/lto-streamer-out.c  2015-02-16 20:26:23.665299091 +0100
@@ -2642,6 +2642,96 @@ produce_symtab (struct output_block *ob)
 }
 
 
+/* Init the streamer_mode_table for output, where we collect info on what
+   machine_mode values have been streamed.  */
+void
+lto_output_init_mode_table (void)
+{
+  memset (streamer_mode_table, '\0', MAX_MACHINE_MODE);
+}
+
+
+/* Write the mode table.  */
+static void
+lto_write_mode_table (void)
+{
+  struct output_block *ob;
+  ob = create_output_block (LTO_section_mode_table);
+  bitpack_d bp = bitpack_create (ob-main_stream);
+
+  /* Ensure that for GET_MODE_INNER (m) != VOIDmode we have
+ also the inner mode marked.  */
+  for (int i = 0; i  (int) MAX_MACHINE_MODE; i++)
+if (streamer_mode_table[i])
+  {
+   machine_mode m = (machine_mode) i;
+   if (GET_MODE_INNER (m) != VOIDmode)
+ streamer_mode_table[(int) GET_MODE_INNER (m)] = 1;
+  }
+  /* First stream modes that have GET_MODE_INNER (m) == VOIDmode,
+ so that we can refer to them afterwards.  */
+  for (int pass = 0; pass  2; pass++)
+for (int i = 0; i  (int) MAX_MACHINE_MODE; i++)
+ 

Re: nvptx offloading patches [3/n], RFD

2015-02-09 Thread Richard Biener
On Wed, Feb 4, 2015 at 12:38 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Sat, Nov 01, 2014 at 12:57:45PM +0100, Bernd Schmidt wrote:
 This is not against current trunk; it applies to gomp-4_0-branch where it is
 one of the necessary parts to make offloading x86-nvptx work. The issue is
 that the LTO file format depends on the machine_modes enum, it needs to
 match between host and offload target. The easiest way to do this is to just
 use the host-modes.def when compiling an offload compiler.

 Ports that want to be hosts for offloading may need to modify their
 modes.def. The patch below contains changes to i386-modes.def which modifies
 XFmode depending on a target switch. I'm not actually entirely sure what to
 do about this. Do we want to make this flag an error when offloading is
 enabled? Or maybe add float format support to the -foffload-abi option?

 Thoughts? Ok for the first part of the patch once the other offloading
 patches have gone in (bootstrapped and tested on x86_64-linux)?

 I don't like this at all.

 IMHO instead we should stream in the offloading LTO sections some kind of mode
 description table (perhaps limited to the modes actually ever streamed),
 and when reading back the offloading LTO sections, let the offloading
 compiler remap the modes to its own modes where there is a mapping in
 between the two, choose some other mapping (e.g. map various vector modes
 the host has but offloading target does not to say BLKmode), or give up
 otherwise with offloading (say if you attempt to stream floating point modes
 the offloading target doesn't support etc.).

 So perhaps stream for each used mode the mode value, corresponding mode
 class, size, precision, inner mode, nunits, and for floating point modes
 supposedly somehow encode the real_format (perhaps just add a name -
 struct real_format mapping for the real.c modes, and map anything else
 to unknown).

I think (also communicated that on IRC) we should instead try not streaming
machine-modes at all but generating them at stream-in time via layout_type
or layout_decl.

Richard.

 Jakub


Re: nvptx offloading patches [3/n], RFD

2015-02-04 Thread Jakub Jelinek
On Sat, Nov 01, 2014 at 12:57:45PM +0100, Bernd Schmidt wrote:
 This is not against current trunk; it applies to gomp-4_0-branch where it is
 one of the necessary parts to make offloading x86-nvptx work. The issue is
 that the LTO file format depends on the machine_modes enum, it needs to
 match between host and offload target. The easiest way to do this is to just
 use the host-modes.def when compiling an offload compiler.
 
 Ports that want to be hosts for offloading may need to modify their
 modes.def. The patch below contains changes to i386-modes.def which modifies
 XFmode depending on a target switch. I'm not actually entirely sure what to
 do about this. Do we want to make this flag an error when offloading is
 enabled? Or maybe add float format support to the -foffload-abi option?
 
 Thoughts? Ok for the first part of the patch once the other offloading
 patches have gone in (bootstrapped and tested on x86_64-linux)?

I don't like this at all.

IMHO instead we should stream in the offloading LTO sections some kind of mode
description table (perhaps limited to the modes actually ever streamed),
and when reading back the offloading LTO sections, let the offloading
compiler remap the modes to its own modes where there is a mapping in
between the two, choose some other mapping (e.g. map various vector modes
the host has but offloading target does not to say BLKmode), or give up
otherwise with offloading (say if you attempt to stream floating point modes
the offloading target doesn't support etc.).

So perhaps stream for each used mode the mode value, corresponding mode
class, size, precision, inner mode, nunits, and for floating point modes
supposedly somehow encode the real_format (perhaps just add a name -
struct real_format mapping for the real.c modes, and map anything else
to unknown).

Jakub


Re: nvptx offloading patches [3/n], RFD

2014-11-03 Thread Jeff Law

On 11/01/14 05:57, Bernd Schmidt wrote:

This is not against current trunk; it applies to gomp-4_0-branch where
it is one of the necessary parts to make offloading x86-nvptx work. The
issue is that the LTO file format depends on the machine_modes enum, it
needs to match between host and offload target. The easiest way to do
this is to just use the host-modes.def when compiling an offload compiler.

Ports that want to be hosts for offloading may need to modify their
modes.def. The patch below contains changes to i386-modes.def which
modifies XFmode depending on a target switch. I'm not actually entirely
sure what to do about this. Do we want to make this flag an error when
offloading is enabled? Or maybe add float format support to the
-foffload-abi option?

Thoughts? Ok for the first part of the patch once the other offloading
patches have gone in (bootstrapped and tested on x86_64-linux)?
It feels like we've got another real distinction to make.  We've had 
host, build  target and they're all independent.  It feels like we need 
offload target and better separate between target and offload target. 
Then we need to figure out the places where we've got bleed-out.


Not sure how to deal with this any further out than the immediate term 
than using a hack like this. Though I'd prefer to avoid the #ifdef as it 
seems to me this shouldn't be baked in at build/configure time.


jeff


nvptx offloading patches [3/n], RFD

2014-11-01 Thread Bernd Schmidt
This is not against current trunk; it applies to gomp-4_0-branch where 
it is one of the necessary parts to make offloading x86-nvptx work. The 
issue is that the LTO file format depends on the machine_modes enum, it 
needs to match between host and offload target. The easiest way to do 
this is to just use the host-modes.def when compiling an offload compiler.


Ports that want to be hosts for offloading may need to modify their 
modes.def. The patch below contains changes to i386-modes.def which 
modifies XFmode depending on a target switch. I'm not actually entirely 
sure what to do about this. Do we want to make this flag an error when 
offloading is enabled? Or maybe add float format support to the 
-foffload-abi option?


Thoughts? Ok for the first part of the patch once the other offloading 
patches have gone in (bootstrapped and tested on x86_64-linux)?



Bernd
	* config.gcc (offload_host_cpu_type): Compute.
	(extra_modes): Use it to pick the offload host CPU's modes.def when
	building an offload target.
	* config/i386/i386-modes.def (XF): Skip adjustments when building an
	offload compiler.

Index: gomp-4_0-branch/gcc/config.gcc
===
--- gomp-4_0-branch.orig/gcc/config.gcc
+++ gomp-4_0-branch/gcc/config.gcc
@@ -483,15 +483,26 @@ tilepro*-*-*)
 	;;
 esac
 
+offload_host_cpu_type=${cpu_type}
+if test x${enable_as_accelerator} != xno
+then
+	offload_host_cpu_type=`echo ${enable_as_accelerator_for} | sed 's/-.*$//'`
+fi
+case ${offload_host_cpu_type} in
+x86_64)
+  offload_host_cpu_type=i386
+	  ;;
+esac
+
 tm_file=${cpu_type}/${cpu_type}.h
 if test -f ${srcdir}/config/${cpu_type}/${cpu_type}-protos.h
 then
 	tm_p_file=${cpu_type}/${cpu_type}-protos.h
 fi
 extra_modes=
-if test -f ${srcdir}/config/${cpu_type}/${cpu_type}-modes.def
+if test -f ${srcdir}/config/${offload_host_cpu_type}/${offload_host_cpu_type}-modes.def
 then
-	extra_modes=${cpu_type}/${cpu_type}-modes.def
+	extra_modes=${offload_host_cpu_type}/${offload_host_cpu_type}-modes.def
 fi
 if test -f ${srcdir}/config/${cpu_type}/${cpu_type}.opt
 then
Index: gomp-4_0-branch/gcc/config/i386/i386-modes.def
===
--- gomp-4_0-branch.orig/gcc/config/i386/i386-modes.def
+++ gomp-4_0-branch/gcc/config/i386/i386-modes.def
@@ -24,6 +24,9 @@ along with GCC; see the file COPYING3.
 FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_intel_96_format);
 FLOAT_MODE (TF, 16, ieee_quad_format);
 
+/* This file may be used when building a compiler for an offload target.
+   Assume that no special floating point options are used.  */
+#ifndef ACCEL_COMPILER
 /* In ILP32 mode, XFmode has size 12 and alignment 4.
In LP64 mode, XFmode has size and alignment 16.  */
 ADJUST_FLOAT_FORMAT (XF, (TARGET_128BIT_LONG_DOUBLE
@@ -33,6 +36,7 @@ ADJUST_FLOAT_FORMAT (XF, (TARGET_128BIT_
 			  : ieee_extended_intel_96_format));
 ADJUST_BYTESIZE  (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 12);
 ADJUST_ALIGNMENT (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 4);
+#endif
 
 /* Add any extra modes needed to represent the condition code.