Re: Regression in target MIC compiler (was: nvptx offloading patches [3/n], RFD)
On Fri, Jul 31, 2015 at 07:53:16PM +0300, Ilya Verbin wrote: On Fri, Jul 31, 2015 at 19:27:58 +0300, Ilya Verbin wrote: I've noticed that target MIC compiler from trunk hangs forever in lto_input_mode_table in this loop, even on simple testcases. On Wed, Feb 18, 2015 at 11:00:35 +0100, Jakub Jelinek wrote: + /* First search just the GET_CLASS_NARROWEST_MODE to wider modes, +if not found, fallback to all modes. */ + int pass; + for (pass = 0; pass 2; pass++) + for (machine_mode mr = pass ? VOIDmode + : GET_CLASS_NARROWEST_MODE (mclass); +pass ? mr MAX_MACHINE_MODE : mr != VOIDmode; +pass ? mr = (machine_mode) (m + 1) + : mr = GET_MODE_WIDER_MODE (mr)) + if (GET_MODE_CLASS (mr) != mclass + || GET_MODE_SIZE (mr) != size + || GET_MODE_PRECISION (mr) != prec + || GET_MODE_INNER (mr) != inner + || GET_MODE_IBIT (mr) != ibit + || GET_MODE_FBIT (mr) != fbit + || GET_MODE_NUNITS (mr) != nunits) + continue; Given that gomp-4_1-branch works ok, the problem was introduced somewhere between 9 and 31 Jul. I'll try to find the revision. Shouldn't 'mr' be here instead of 'm'? I think so. If it works, patch preapproved. But wonder what changed that we haven't been triggering it before. What mode do you think it on (mclass/size/prec/inner/ibit/fbit/nunits)? Jakub
Regression in target MIC compiler (was: nvptx offloading patches [3/n], RFD)
Hi! I've noticed that target MIC compiler from trunk hangs forever in lto_input_mode_table in this loop, even on simple testcases. On Wed, Feb 18, 2015 at 11:00:35 +0100, Jakub Jelinek wrote: + /* First search just the GET_CLASS_NARROWEST_MODE to wider modes, +if not found, fallback to all modes. */ + int pass; + for (pass = 0; pass 2; pass++) + for (machine_mode mr = pass ? VOIDmode + : GET_CLASS_NARROWEST_MODE (mclass); +pass ? mr MAX_MACHINE_MODE : mr != VOIDmode; +pass ? mr = (machine_mode) (m + 1) + : mr = GET_MODE_WIDER_MODE (mr)) + if (GET_MODE_CLASS (mr) != mclass + || GET_MODE_SIZE (mr) != size + || GET_MODE_PRECISION (mr) != prec + || GET_MODE_INNER (mr) != inner + || GET_MODE_IBIT (mr) != ibit + || GET_MODE_FBIT (mr) != fbit + || GET_MODE_NUNITS (mr) != nunits) + continue; Given that gomp-4_1-branch works ok, the problem was introduced somewhere between 9 and 31 Jul. I'll try to find the revision. -- Ilya
Re: Regression in target MIC compiler (was: nvptx offloading patches [3/n], RFD)
On Fri, Jul 31, 2015 at 19:27:58 +0300, Ilya Verbin wrote: I've noticed that target MIC compiler from trunk hangs forever in lto_input_mode_table in this loop, even on simple testcases. On Wed, Feb 18, 2015 at 11:00:35 +0100, Jakub Jelinek wrote: + /* First search just the GET_CLASS_NARROWEST_MODE to wider modes, + if not found, fallback to all modes. */ + int pass; + for (pass = 0; pass 2; pass++) + for (machine_mode mr = pass ? VOIDmode + : GET_CLASS_NARROWEST_MODE (mclass); + pass ? mr MAX_MACHINE_MODE : mr != VOIDmode; + pass ? mr = (machine_mode) (m + 1) + : mr = GET_MODE_WIDER_MODE (mr)) + if (GET_MODE_CLASS (mr) != mclass + || GET_MODE_SIZE (mr) != size + || GET_MODE_PRECISION (mr) != prec + || GET_MODE_INNER (mr) != inner + || GET_MODE_IBIT (mr) != ibit + || GET_MODE_FBIT (mr) != fbit + || GET_MODE_NUNITS (mr) != nunits) + continue; Given that gomp-4_1-branch works ok, the problem was introduced somewhere between 9 and 31 Jul. I'll try to find the revision. Shouldn't 'mr' be here instead of 'm'? mr = (machine_mode) (m + 1) -- Ilya
Re: Regression in target MIC compiler (was: nvptx offloading patches [3/n], RFD)
On Fri, Jul 31, 2015 at 18:59:59 +0200, Jakub Jelinek wrote: On Fri, Jul 31, 2015 at 07:53:16PM +0300, Ilya Verbin wrote: On Fri, Jul 31, 2015 at 19:27:58 +0300, Ilya Verbin wrote: I've noticed that target MIC compiler from trunk hangs forever in lto_input_mode_table in this loop, even on simple testcases. On Wed, Feb 18, 2015 at 11:00:35 +0100, Jakub Jelinek wrote: + /* First search just the GET_CLASS_NARROWEST_MODE to wider modes, + if not found, fallback to all modes. */ + int pass; + for (pass = 0; pass 2; pass++) + for (machine_mode mr = pass ? VOIDmode + : GET_CLASS_NARROWEST_MODE (mclass); + pass ? mr MAX_MACHINE_MODE : mr != VOIDmode; + pass ? mr = (machine_mode) (m + 1) + : mr = GET_MODE_WIDER_MODE (mr)) + if (GET_MODE_CLASS (mr) != mclass + || GET_MODE_SIZE (mr) != size + || GET_MODE_PRECISION (mr) != prec + || GET_MODE_INNER (mr) != inner + || GET_MODE_IBIT (mr) != ibit + || GET_MODE_FBIT (mr) != fbit + || GET_MODE_NUNITS (mr) != nunits) + continue; Given that gomp-4_1-branch works ok, the problem was introduced somewhere between 9 and 31 Jul. I'll try to find the revision. Shouldn't 'mr' be here instead of 'm'? I think so. If it works, patch preapproved. It fixes the infinite loop, but causes an error: lto1: fatal error: unsupported mode QI But wonder what changed that we haven't been triggering it before. What mode do you think it on (mclass/size/prec/inner/ibit/fbit/nunits)? When in hangs, mr is HImode. -- Ilya
Offloading compilers' libgcc installation (was: nvptx offloading patches [3/n], RFD)
Hi Bernd! On Thu, 19 Feb 2015 10:28:46 +0100, Bernd Schmidt ber...@codesourcery.com wrote: issue when trying to get at the libgcc for the nvptx accel compiler after it's been installed. The libgcc Makefile puts it in the wrong place - gcc/nvptx-none/accel/nvptx-none instead of gcc/host/accel/nvptx-none. The patch below corrects that and removes an intelmicemul special case which I believe has the same effect - Ilya, could you test this? Works fine for me for intelmic (no changes), and nvptx (changes as expected). You'll want to remove the following debugging print statement before commit: --- libgcc/configure.ac (revision 445788) +++ libgcc/configure.ac (working copy) @@ -398,16 +398,14 @@ esac # Used for constructing correct paths for offload compilers. accel_dir_suffix= +real_host_noncanonical=${host_noncanonical} +echo eaaf: $enable_as_accelerator_for Grüße, Thomas pgp6YdwbeWs3N.pgp Description: PGP signature
Re: If we're building an offloading compiler, always enable the LTO front end (was: nvptx offloading patches [3/n], RFD)
Hi! On Thu, 19 Feb 2015 11:51:02 +0100, Jakub Jelinek ja...@redhat.com wrote: On Thu, Feb 19, 2015 at 11:48:17AM +0100, Thomas Schwinge wrote: Like this? Yes. commit 56c0312469f583ba3fa9fa2777981742ab6d6c75 Author: Thomas Schwinge tho...@codesourcery.com Date: Thu Feb 19 11:41:23 2015 +0100 If we're building an offloading compiler, always enable the LTO front end. * configure.ac [--enable-as-accelerator-for] (enable_languages): Make sure it contains lto. * configure: Regenerate. Ok for trunk. Committed in r220838. Grüße, Thomas pgpWuUpNXU1Ht.pgp Description: PGP signature
Re: Offloading compilers' libgcc installation (was: nvptx offloading patches [3/n], RFD)
On Fri, Feb 20, 2015 at 10:27:26 +0100, Thomas Schwinge wrote: On Thu, 19 Feb 2015 10:28:46 +0100, Bernd Schmidt ber...@codesourcery.com wrote: issue when trying to get at the libgcc for the nvptx accel compiler after it's been installed. The libgcc Makefile puts it in the wrong place - gcc/nvptx-none/accel/nvptx-none instead of gcc/host/accel/nvptx-none. The patch below corrects that and removes an intelmicemul special case which I believe has the same effect - Ilya, could you test this? Works fine for me for intelmic (no changes), and nvptx (changes as expected). OK to me. Thanks, -- Ilya
Re: nvptx offloading patches [3/n], RFD
On 02/17/2015 05:40 PM, Jakub Jelinek wrote: On Tue, Feb 17, 2015 at 04:21:06PM +, Joseph Myers wrote: On Tue, 17 Feb 2015, Jakub Jelinek wrote: Third attempt failed with: ../../../libgcc/config/nvptx/realloc.c:24:20: fatal error: stdlib.h: No such file or directory compilation terminated. ../../../libgcc/static-object.mk:17: recipe for target 'realloc.o' failed make[2]: *** [realloc.o] Error 1 make[2]: *** Waiting for unfinished jobs make[2]: Leaving directory '/usr/src/gcc/objnvptx/nvptx-none/libgcc' I have nvptx-newlib symlinked into the gcc tree as newlib, so I expected it would be built in-tree, is that not the case (at least wiki/Offloading mentions that). Or is it just that libgcc can't really have dependencies on newlib headers as newlib is built after libgcc? I've committed this patch to fix this last issue (the header dependence, that is; I don't know about the in-tree build). Thanks, sure, libgcc now builds fine, the in-tree build fails: configure:4261: checking for C compiler default output file name configure:4283: /usr/src/gcc/objnvptx/./gcc/xgcc -B/usr/src/gcc/objnvptx/./gcc/ -nostdinc -B/usr/src/gcc/objnvptx/nvptx-none/newlib/ -isystem /usr/src/gcc/objnvptx/nvptx-none/newlib/targ-include -isystem /usr/src/gcc/newlib/libc/include -B/usr/local/nvptx-none/bin/ -B/usr/local/nvptx-none/lib/ -isystem /usr/local/nvptx-none/include -isystem /usr/local/nvptx-none/sys-include-g -O2 conftest.c 5 error opening libc.a collect2: error: ld returned 1 exit status very early during in-tree newlib configure. Not a fix for your problem, but there's a similar issue when trying to get at the libgcc for the nvptx accel compiler after it's been installed. The libgcc Makefile puts it in the wrong place - gcc/nvptx-none/accel/nvptx-none instead of gcc/host/accel/nvptx-none. The patch below corrects that and removes an intelmicemul special case which I believe has the same effect - Ilya, could you test this? Bernd Index: libgcc/Makefile.in === --- libgcc/Makefile.in (revision 445788) +++ libgcc/Makefile.in (working copy) @@ -45,6 +45,7 @@ fixed_point = @fixed_point@ with_aix_soname = @with_aix_soname@ host_noncanonical = @host_noncanonical@ +real_host_noncanonical = @real_host_noncanonical@ target_noncanonical = @target_noncanonical@ # List of extra object files that should be compiled for this target machine. @@ -185,7 +186,7 @@ STRIP = @STRIP@ STRIP_FOR_TARGET = $(STRIP) # Directory in which the compiler finds libraries etc. -libsubdir = $(libdir)/gcc/$(host_noncanonical)/$(version)@accel_dir_suffix@ +libsubdir = $(libdir)/gcc/$(real_host_noncanonical)/$(version)@accel_dir_suffix@ # Used to install the shared libgcc. slibdir = @slibdir@ # Maybe used for DLLs on Windows targets. Index: libgcc/configure.ac === --- libgcc/configure.ac (revision 445788) +++ libgcc/configure.ac (working copy) @@ -398,16 +398,14 @@ esac # Used for constructing correct paths for offload compilers. accel_dir_suffix= +real_host_noncanonical=${host_noncanonical} +echo eaaf: $enable_as_accelerator_for if test x$enable_as_accelerator_for != x; then accel_dir_suffix=/accel/${target_noncanonical} - case ${target_noncanonical} in -*-intelmicemul-*) - # In this case we expect offload compiler to be built as native, so we - # need to change install directory for driver to be able to find libgcc. - host_noncanonical=${enable_as_accelerator_for} ;; - esac + real_host_noncanonical=${enable_as_accelerator_for} fi AC_SUBST(accel_dir_suffix) +AC_SUBST(real_host_noncanonical) if test x$enable_offload_targets != x; then extra_parts=${extra_parts} crtoffloadbegin.o crtoffloadend.o Index: libgcc/configure === --- libgcc/configure (revision 445788) +++ libgcc/configure (working copy) @@ -566,6 +566,7 @@ sfp_machine_header set_use_emutls set_have_cc_tls vis_hide +real_host_noncanonical accel_dir_suffix force_explicit_eh_registry fixed_point @@ -4482,17 +4483,15 @@ esac # Used for constructing correct paths for offload compilers. accel_dir_suffix= +real_host_noncanonical=${host_noncanonical} +echo eaaf: $enable_as_accelerator_for if test x$enable_as_accelerator_for != x; then accel_dir_suffix=/accel/${target_noncanonical} - case ${target_noncanonical} in -*-intelmicemul-*) - # In this case we expect offload compiler to be built as native, so we - # need to change install directory for driver to be able to find libgcc. - host_noncanonical=${enable_as_accelerator_for} ;; - esac + real_host_noncanonical=${enable_as_accelerator_for} fi + if test x$enable_offload_targets != x; then extra_parts=${extra_parts} crtoffloadbegin.o crtoffloadend.o fi
If we're building an offloading compiler, always enable the LTO front end (was: nvptx offloading patches [3/n], RFD)
Hi! On Wed, 18 Feb 2015 13:35:18 +0100, Jakub Jelinek ja...@redhat.com wrote: On Wed, Feb 18, 2015 at 01:09:53PM +0100, Thomas Schwinge wrote: On Wed, 18 Feb 2015 12:34:38 +0100, Jakub Jelinek ja...@redhat.com wrote: offloading fails: /usr/src/gcc/objnvptxinst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/5.0.0//accel/nvptx-none/mkoffload @/tmp/cce9PdmR x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: language lto not recognized x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: language lto not recognized mkoffload: fatal error: /usr/src/gcc/objnvptxinst/usr/local/bin/x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 exit status compilation terminated. lto-wrapper: fatal error: /usr/src/gcc/objnvptxinst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/5.0.0//accel/nvptx-none/mkoffload returned 1 exit status compilation terminated. /usr/bin/ld: lto-wrapper failed collect2: error: ld returned 1 exit status Is --enable-languages=c,c++,fortran,lto required when configuring the offload compiler? It isn't required for intelmic. Yes, exactly. I assume the reason is that x86_64-intelmicemul-linux-gnu defaults to supporting LTO, and due to this also defaults to building the LTO front end. I'll enhance the nvptx offloading documentation accordingly. Maybe we should add some magic to build the LTO front end if --enable-as-accelerator-for=[...] has been specified? Toplevel configure.ac has: # If LTO is enabled, add the LTO front end. if test $enable_lto = yes ; then case ,${enable_languages}, in *,lto,*) ;; *) enable_languages=${enable_languages},lto ;; esac if test ${build_lto_plugin} = yes ; then configdirs=$configdirs lto-plugin fi fi so IMHO we want similar snippet for the --enable-as-accelerator-for= case, perhaps right below this one. Not building lto FE for the accelerator compilers make them completely useless, thus I think we really want to do that automatically. Like this? commit 56c0312469f583ba3fa9fa2777981742ab6d6c75 Author: Thomas Schwinge tho...@codesourcery.com Date: Thu Feb 19 11:41:23 2015 +0100 If we're building an offloading compiler, always enable the LTO front end. * configure.ac [--enable-as-accelerator-for] (enable_languages): Make sure it contains lto. * configure: Regenerate. --- configure|8 configure.ac |8 2 files changed, 16 insertions(+) diff --git configure configure index dd794db..2afc52b 100755 --- configure +++ configure @@ -6217,6 +6217,14 @@ if test -d ${srcdir}/gcc; then fi fi + # If we're building an offloading compiler, add the LTO front end. + if test x$enable_as_accelerator_for != x ; then +case ,${enable_languages}, in + *,lto,*) ;; + *) enable_languages=${enable_languages},lto ;; +esac + fi + missing_languages=`echo ,$enable_languages, | sed -e s/,all,/,/ -e s/,c,/,/ ` potential_languages=,c, diff --git configure.ac configure.ac index 4ea5e00..08a6fbf 100644 --- configure.ac +++ configure.ac @@ -1918,6 +1918,14 @@ if test -d ${srcdir}/gcc; then fi fi + # If we're building an offloading compiler, add the LTO front end. + if test x$enable_as_accelerator_for != x ; then +case ,${enable_languages}, in + *,lto,*) ;; + *) enable_languages=${enable_languages},lto ;; +esac + fi + missing_languages=`echo ,$enable_languages, | sed -e s/,all,/,/ -e s/,c,/,/ ` potential_languages=,c, Grüße, Thomas pgpnYFvMGYhBl.pgp Description: PGP signature
Offloading compilers' support libraries (was: nvptx offloading patches [3/n], RFD)
Hi! On Thu, 19 Feb 2015 10:28:46 +0100, Bernd Schmidt ber...@codesourcery.com wrote: On 02/17/2015 05:40 PM, Jakub Jelinek wrote: On Tue, Feb 17, 2015 at 04:21:06PM +, Joseph Myers wrote: On Tue, 17 Feb 2015, Jakub Jelinek wrote: Third attempt failed with: ../../../libgcc/config/nvptx/realloc.c:24:20: fatal error: stdlib.h: No such file or directory compilation terminated. ../../../libgcc/static-object.mk:17: recipe for target 'realloc.o' failed make[2]: *** [realloc.o] Error 1 make[2]: *** Waiting for unfinished jobs make[2]: Leaving directory '/usr/src/gcc/objnvptx/nvptx-none/libgcc' I have nvptx-newlib symlinked into the gcc tree as newlib, so I expected it would be built in-tree, is that not the case (at least wiki/Offloading mentions that). Or is it just that libgcc can't really have dependencies on newlib headers as newlib is built after libgcc? I've committed this patch to fix this last issue (the header dependence, that is; I don't know about the in-tree build). Thanks, sure, libgcc now builds fine, the in-tree build fails: configure:4261: checking for C compiler default output file name configure:4283: /usr/src/gcc/objnvptx/./gcc/xgcc -B/usr/src/gcc/objnvptx/./gcc/ -nostdinc -B/usr/src/gcc/objnvptx/nvptx-none/newlib/ -isystem /usr/src/gcc/objnvptx/nvptx-none/newlib/targ-include -isystem /usr/src/gcc/newlib/libc/include -B/usr/local/nvptx-none/bin/ -B/usr/local/nvptx-none/lib/ -isystem /usr/local/nvptx-none/include -isystem /usr/local/nvptx-none/sys-include-g -O2 conftest.c 5 error opening libc.a collect2: error: ld returned 1 exit status very early during in-tree newlib configure. Not a fix for your problem, but there's a similar issue when trying to get at the libgcc for the nvptx accel compiler after it's been installed. The libgcc Makefile puts it in the wrong place - gcc/nvptx-none/accel/nvptx-none instead of gcc/host/accel/nvptx-none. I also wondered about this; it's somewhere on my TODO list... The patch below corrects that and removes an intelmicemul special case which I believe has the same effect - Ilya, could you test this? This code has originally been posted in http://news.gmane.org/find-root.php?message_id=%3C20140926123551.GA6892%40msticlxl57.ims.intel.com%3E. This specific buglet aside (that the handling of intelmic and nvptx offloading is inconsistent) -- will we have to add such handling to each and every library that is built for the offloading compilers? (Including libraries that aren't part of the GCC sources, but may be built as part of GCC's build process, such as when newlib is linked into [GCC]/newlib?) One step back -- I understand correctly that this change is to make sure that the regular target compiler and the offloading compilers don't clash in their installed files' names? (By putting them into the accel/[offloading architecture]/ subdirectory?) (As I've written in http://news.gmane.org/find-root.php?message_id=%3C87vbize7zi.fsf%40schwinge.name%3E, I currently install into separate prefixes/DESTDIRS, because I have not yet verified that there is no overlap in the installed files.) Then, why does this only apply to libsubdir? What about header files, documentation files, and so on? (If they aren't expected to differ between the target and offloading compilers, I think it's still not a good idea to arbitrarely have them be overwritten by on respective build tree's make install process.) Should we have a more general solution to this problem? Index: libgcc/Makefile.in === --- libgcc/Makefile.in(revision 445788) +++ libgcc/Makefile.in(working copy) @@ -45,6 +45,7 @@ fixed_point = @fixed_point@ with_aix_soname = @with_aix_soname@ host_noncanonical = @host_noncanonical@ +real_host_noncanonical = @real_host_noncanonical@ target_noncanonical = @target_noncanonical@ # List of extra object files that should be compiled for this target machine. @@ -185,7 +186,7 @@ STRIP = @STRIP@ STRIP_FOR_TARGET = $(STRIP) # Directory in which the compiler finds libraries etc. -libsubdir = $(libdir)/gcc/$(host_noncanonical)/$(version)@accel_dir_suffix@ +libsubdir = $(libdir)/gcc/$(real_host_noncanonical)/$(version)@accel_dir_suffix@ # Used to install the shared libgcc. slibdir = @slibdir@ # Maybe used for DLLs on Windows targets. Index: libgcc/configure.ac === --- libgcc/configure.ac (revision 445788) +++ libgcc/configure.ac (working copy) @@ -398,16 +398,14 @@ esac # Used for constructing correct paths for offload compilers. accel_dir_suffix= +real_host_noncanonical=${host_noncanonical} +echo eaaf: $enable_as_accelerator_for if test x$enable_as_accelerator_for != x; then accel_dir_suffix=/accel/${target_noncanonical} - case ${target_noncanonical} in -
Re: If we're building an offloading compiler, always enable the LTO front end (was: nvptx offloading patches [3/n], RFD)
On Thu, Feb 19, 2015 at 11:48:17AM +0100, Thomas Schwinge wrote: Like this? Yes. commit 56c0312469f583ba3fa9fa2777981742ab6d6c75 Author: Thomas Schwinge tho...@codesourcery.com Date: Thu Feb 19 11:41:23 2015 +0100 If we're building an offloading compiler, always enable the LTO front end. * configure.ac [--enable-as-accelerator-for] (enable_languages): Make sure it contains lto. * configure: Regenerate. Ok for trunk. Jakub
Re: nvptx offloading patches [3/n], RFD
On Wed, Feb 18, 2015 at 10:12:19AM +0100, Thomas Schwinge wrote: On Tue, 17 Feb 2015 17:40:33 +0100, Jakub Jelinek ja...@redhat.com wrote: On Tue, Feb 17, 2015 at 04:21:06PM +, Joseph Myers wrote: On Tue, 17 Feb 2015, Jakub Jelinek wrote: I have nvptx-newlib symlinked into the gcc tree as newlib, so I expected it would be built in-tree, is that not the case (at least wiki/Offloading mentions that). configure:4261: checking for C compiler default output file name configure:4283: /usr/src/gcc/objnvptx/./gcc/xgcc -B/usr/src/gcc/objnvptx/./gcc/ -nostdinc -B/usr/src/gcc/objnvptx/nvptx-none/newlib/ -isystem /usr/src/gcc/objnvptx/nvptx-none/newlib/targ-include -isystem /usr/src/gcc/newlib/libc/include -B/usr/local/nvptx-none/bin/ -B/usr/local/nvptx-none/lib/ -isystem /usr/local/nvptx-none/include -isystem /usr/local/nvptx-none/sys-include-g -O2 conftest.c 5 error opening libc.a collect2: error: ld returned 1 exit status very early during in-tree newlib configure. Do you literally have »nvptx-newlib symlinked into the gcc tree as newlib«? If yes, then that should explain the problem: as I wrote in http://news.gmane.org/find-root.php?message_id=%3C87egq8mir1.fsf%40schwinge.name%3E, you need to »add a symbolic link to nvptx-newlib's newlib directory to the directory containing the GCC sources«, so not link [GCC]/newlib - [newlib-nvptx], but [GCC]/newlib - [newlib-nvptx]/newlib. Does that resolve the issue? My bad. Yes, that does resolve the issue, make make install now worked for nvptx-none for me with the patches (2 from Bernd, my mode_table, my t-nvptx). Can you or Bernd comment on the other issues I've raised, i.e. whether you are going to apply Bernd's approved patches, on the t-nvptx fix? I'll try to have a look at the va_list stuff, if it blocks everything rather than just testcases with va_list being offloaded. Jakub
Re: nvptx offloading patches [3/n], RFD
Hi! On Mon, 16 Feb 2015 22:08:12 +0100, Jakub Jelinek ja...@redhat.com wrote: On Mon, Feb 09, 2015 at 11:20:00AM +0100, Richard Biener wrote: I think (also communicated that on IRC) we should instead try not streaming machine-modes at all but generating them at stream-in time via layout_type or layout_decl. Here is a WIP prototype for being able to stream a machine mode description table and streaming it back in. [...] Many thanks for that! (I had modified Bernd's patch to be less intrusive, see attached, but of course that didn't resolve its design problem.) On Mon, 16 Feb 2015 22:43:49 +0100, Jakub Jelinek ja...@redhat.com wrote: [updated patch] No regressions with --enable-offload-targets=nvptx-none=[...],x86_64-intelmicemul-linux-gnu=[...]. Grüße, Thomas commit 97a1ad0d3a96321ded8fad5e3a3cc75b46970bfa Author: Thomas Schwinge tho...@codesourcery.com Date: Fri Feb 13 19:51:09 2015 +0100 Use the offload host CPU's modes.def when building an offloading compiler: make it less intrusive. diff --git gcc/config.gcc gcc/config.gcc index ebf0ee6..265ac0e 100644 --- gcc/config.gcc +++ gcc/config.gcc @@ -482,15 +482,15 @@ tilepro*-*-*) ;; esac -offload_host_cpu_type=${cpu_type} -if test x${enable_as_accelerator} != xno -then - offload_host_cpu_type=`echo ${enable_as_accelerator_for} | sed 's/-.*$//'` -fi -case ${offload_host_cpu_type} in -x86_64) - offload_host_cpu_type=i386 - ;; +modes_cpu_type=${cpu_type} +case ${enable_as_accelerator}:${target} in +yes:nvptx-*-*) + modes_cpu_type=`echo ${enable_as_accelerator_for} | sed 's/-.*$//'` + case ${modes_cpu_type} in + x86_64) + modes_cpu_type=i386 + ;; + esac esac tm_file=${cpu_type}/${cpu_type}.h @@ -499,9 +499,9 @@ then tm_p_file=${cpu_type}/${cpu_type}-protos.h fi extra_modes= -if test -f ${srcdir}/config/${offload_host_cpu_type}/${offload_host_cpu_type}-modes.def +if test -f ${srcdir}/config/${modes_cpu_type}/${modes_cpu_type}-modes.def then - extra_modes=${offload_host_cpu_type}/${offload_host_cpu_type}-modes.def + extra_modes=${modes_cpu_type}/${modes_cpu_type}-modes.def fi if test -f ${srcdir}/config/${cpu_type}/${cpu_type}.opt then diff --git gcc/config/i386/i386-modes.def gcc/config/i386/i386-modes.def index 766681b..0b6a1f1 100644 --- gcc/config/i386/i386-modes.def +++ gcc/config/i386/i386-modes.def @@ -24,9 +24,6 @@ along with GCC; see the file COPYING3. If not see FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_intel_96_format); FLOAT_MODE (TF, 16, ieee_quad_format); -/* This file may be used when building a compiler for an offload target. - Assume that no special floating point options are used. */ -#ifndef ACCEL_COMPILER /* In ILP32 mode, XFmode has size 12 and alignment 4. In LP64 mode, XFmode has size and alignment 16. */ ADJUST_FLOAT_FORMAT (XF, (TARGET_128BIT_LONG_DOUBLE @@ -36,7 +33,6 @@ ADJUST_FLOAT_FORMAT (XF, (TARGET_128BIT_LONG_DOUBLE : ieee_extended_intel_96_format)); ADJUST_BYTESIZE (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 12); ADJUST_ALIGNMENT (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 4); -#endif /* Add any extra modes needed to represent the condition code. diff --git gcc/config/nvptx/nvptx.h gcc/config/nvptx/nvptx.h index 9a9954b..c0d97ee 100644 --- gcc/config/nvptx/nvptx.h +++ gcc/config/nvptx/nvptx.h @@ -64,6 +64,14 @@ #define DOUBLE_TYPE_SIZE 64 #define LONG_DOUBLE_TYPE_SIZE 64 +#ifdef ACCEL_COMPILER +/* For ../i386/i386-modes.def. */ +/* See ../i386/unix.h:TARGET_SUBTARGET64_DEFAULT. */ +# define TARGET_128BIT_LONG_DOUBLE (TARGET_ABI64) +/* See ../i386/i386.h:TARGET_96_ROUND_53_LONG_DOUBLE. */ +# define TARGET_96_ROUND_53_LONG_DOUBLE 0 +#endif + #undef SIZE_TYPE #define SIZE_TYPE (TARGET_ABI64 ? long unsigned int : unsigned int) #undef PTRDIFF_TYPE signature.asc Description: PGP signature
Re: nvptx offloading patches [3/n], RFD
Hi! On Tue, 17 Feb 2015 17:40:33 +0100, Jakub Jelinek ja...@redhat.com wrote: On Tue, Feb 17, 2015 at 04:21:06PM +, Joseph Myers wrote: On Tue, 17 Feb 2015, Jakub Jelinek wrote: I have nvptx-newlib symlinked into the gcc tree as newlib, so I expected it would be built in-tree, is that not the case (at least wiki/Offloading mentions that). configure:4261: checking for C compiler default output file name configure:4283: /usr/src/gcc/objnvptx/./gcc/xgcc -B/usr/src/gcc/objnvptx/./gcc/ -nostdinc -B/usr/src/gcc/objnvptx/nvptx-none/newlib/ -isystem /usr/src/gcc/objnvptx/nvptx-none/newlib/targ-include -isystem /usr/src/gcc/newlib/libc/include -B/usr/local/nvptx-none/bin/ -B/usr/local/nvptx-none/lib/ -isystem /usr/local/nvptx-none/include -isystem /usr/local/nvptx-none/sys-include-g -O2 conftest.c 5 error opening libc.a collect2: error: ld returned 1 exit status very early during in-tree newlib configure. Do you literally have »nvptx-newlib symlinked into the gcc tree as newlib«? If yes, then that should explain the problem: as I wrote in http://news.gmane.org/find-root.php?message_id=%3C87egq8mir1.fsf%40schwinge.name%3E, you need to »add a symbolic link to nvptx-newlib's newlib directory to the directory containing the GCC sources«, so not link [GCC]/newlib - [newlib-nvptx], but [GCC]/newlib - [newlib-nvptx]/newlib. Does that resolve the issue? Grüße, Thomas signature.asc Description: PGP signature
Re: nvptx offloading patches [3/n], RFD
On Wed, Feb 18, 2015 at 10:12:19AM +0100, Thomas Schwinge wrote: Do you literally have »nvptx-newlib symlinked into the gcc tree as newlib«? If yes, then that should explain the problem: as I wrote in http://news.gmane.org/find-root.php?message_id=%3C87egq8mir1.fsf%40schwinge.name%3E, you need to »add a symbolic link to nvptx-newlib's newlib directory to the directory containing the GCC sources«, so not link [GCC]/newlib - [newlib-nvptx], but [GCC]/newlib - [newlib-nvptx]/newlib. Does that resolve the issue? BTW, --with-cuda-driver-{include,lib} are apparently not documented in gcc/doc/ (--with-cuda-driver neither, but can't use that, as lib is /usr/local/cuda-6.5/lib64 in my case), and isn't documented on wiki/Offloading either. ../configure --target=nvptx-none --enable-as-accelerator-for=x86_64-pc-linux-gnu --with-build-time-tools=/usr/src/gcc/objnvptxinst/usr/local/nvptx-none/bin --disable-sjlj-exceptions --enable-newlib-io-long-long make; make DESTDIR=/usr/src/gcc/objnvptxinst install and ../configure --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu --enable-offload-targets=nvptx-none=/usr/src/gcc/objnvptxinst --disable-bootstrap --with-cuda-driver-include=/usr/local/cuda-6.5/include --with-cuda-driver-lib=/usr/local/cuda-6.5/lib64 make; make DESTDIR=/usr/src/gcc/objnvptxinst install compilers now build, but offloading fails: /usr/src/gcc/objnvptxinst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/5.0.0//accel/nvptx-none/mkoffload @/tmp/cce9PdmR x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: language lto not recognized x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: language lto not recognized mkoffload: fatal error: /usr/src/gcc/objnvptxinst/usr/local/bin/x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 exit status compilation terminated. lto-wrapper: fatal error: /usr/src/gcc/objnvptxinst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/5.0.0//accel/nvptx-none/mkoffload returned 1 exit status compilation terminated. /usr/bin/ld: lto-wrapper failed collect2: error: ld returned 1 exit status Is --enable-languages=c,c++,fortran,lto required when configuring the offload compiler? It isn't required for intelmic. Jakub
Re: nvptx offloading patches [3/n], RFD
On Tue, Feb 17, 2015 at 11:00:14AM +0100, Richard Biener wrote: I'm just looking for a way to make this less of a hack (and the LTO IL less target dependent). Not for GCC 5 for which something like your patch is probably ok, but for the future. So, given Ilya's and Thomas' testing, is this acceptable for now, and perhaps we can try to do something better for GCC 6? Here is the patch with full ChangeLog: 2015-02-18 Jakub Jelinek ja...@redhat.com * passes.c (ipa_write_summaries_1): Call lto_output_init_mode_table. (ipa_write_optimization_summaries): Likewise. * tree-streamer.h: Include data-streamer.h. (streamer_mode_table): Declare extern variable. (bp_pack_machine_mode, bp_unpack_machine_mode): New inline functions. * lto-streamer-out.c (lto_output_init_mode_table, lto_write_mode_table): New functions. (produce_asm_for_decls): Call lto_write_mode_table when streaming offloading LTO. * lto-section-in.c (lto_section_name): Add mode_table entry. (lto_create_simple_input_block): Add mode_table argument to the lto_input_block constructors. * ipa-prop.c (ipa_prop_read_section, read_replacements_section): Likewise. * data-streamer-in.c (string_for_index): Likewise. * ipa-inline-analysis.c (inline_read_section): Likewise. * ipa-icf.c (sem_item_optimizer::read_section): Likewise. * lto-cgraph.c (input_cgraph_opt_section): Likewise. * lto-streamer-in.c (lto_read_body_or_constructor, lto_input_toplevel_asms): Likewise. (lto_input_mode_table): New function. * tree-streamer-out.c (pack_ts_fixed_cst_value_fields, pack_ts_decl_common_value_fields, pack_ts_type_common_value_fields): Use bp_pack_machine_mode. * real.h (struct real_format): Add name field. * lto-streamer.h (enum lto_section_type): Add LTO_section_mode_table. (class lto_input_block): Add mode_table member. (lto_input_block::lto_input_block): Add mode_table_ argument, initialize mode_table. (struct lto_file_decl_data): Add mode_table field. (lto_input_mode_table, lto_output_init_mode_table): New prototypes. * tree-streamer-in.c (unpack_ts_fixed_cst_value_fields, unpack_ts_decl_common_value_fields, unpack_ts_type_common_value_fields): Call bp_unpack_machine_mode. * tree-streamer.c (streamer_mode_table): New variable. * real.c (ieee_single_format, mips_single_format, motorola_single_format, spu_single_format, ieee_double_format, mips_double_format, motorola_double_format, ieee_extended_motorola_format, ieee_extended_intel_96_format, ieee_extended_intel_128_format, ieee_extended_intel_96_round_53_format, ibm_extended_format, mips_extended_format, ieee_quad_format, mips_quad_format, vax_f_format, vax_d_format, vax_g_format, decimal_single_format, decimal_double_format, decimal_quad_format, ieee_half_format, arm_half_format, real_internal_format): Add name field. * config/pdp11/pdp11.c (pdp11_f_format, pdp11_d_format): Likewise. lto/ * lto.c (lto_mode_identity_table): New variable. (lto_read_decls): Add mode_table argument to the lto_input_block constructor. (lto_file_finalize): Initialize mode_table. (lto_init): Initialize lto_mode_identity_table. --- gcc/passes.c.jj 2015-02-16 22:18:33.219702315 +0100 +++ gcc/passes.c2015-02-16 22:19:20.842917807 +0100 @@ -2460,6 +2460,7 @@ ipa_write_summaries_1 (lto_symtab_encode struct lto_out_decl_state *state = lto_new_out_decl_state (); state-symtab_node_encoder = encoder; + lto_output_init_mode_table (); lto_push_out_decl_state (state); gcc_assert (!flag_wpa); @@ -2581,6 +2582,7 @@ ipa_write_optimization_summaries (lto_sy lto_symtab_encoder_iterator lsei; state-symtab_node_encoder = encoder; + lto_output_init_mode_table (); lto_push_out_decl_state (state); for (lsei = lsei_start_function_in_partition (encoder); !lsei_end_p (lsei); lsei_next_function_in_partition (lsei)) --- gcc/tree-streamer.h.jj 2015-02-16 22:18:33.222702266 +0100 +++ gcc/tree-streamer.h 2015-02-16 22:19:20.843917791 +0100 @@ -24,6 +24,7 @@ along with GCC; see the file COPYING3. #include streamer-hooks.h #include lto-streamer.h +#include data-streamer.h #include hash-map.h /* Cache of pickled nodes. Used to avoid writing the same node more @@ -91,6 +92,7 @@ void streamer_write_integer_cst (struct void streamer_write_builtin (struct output_block *, tree); /* In tree-streamer.c. */ +extern unsigned char streamer_mode_table[1 8]; void streamer_check_handled_ts_structures (void); bool streamer_tree_cache_insert (struct streamer_tree_cache_d *, tree, hashval_t, unsigned *); @@ -119,5 +121,19 @@ streamer_tree_cache_get_hash (struct str
Re: nvptx offloading patches [3/n], RFD
Hi Jakub! (Will respond to your other questions later.) On Wed, 18 Feb 2015 12:34:38 +0100, Jakub Jelinek ja...@redhat.com wrote: On Wed, Feb 18, 2015 at 10:12:19AM +0100, Thomas Schwinge wrote: Do you literally have »nvptx-newlib symlinked into the gcc tree as newlib«? If yes, then that should explain the problem: as I wrote in http://news.gmane.org/find-root.php?message_id=%3C87egq8mir1.fsf%40schwinge.name%3E, you need to »add a symbolic link to nvptx-newlib's newlib directory to the directory containing the GCC sources«, so not link [GCC]/newlib - [newlib-nvptx], but [GCC]/newlib - [newlib-nvptx]/newlib. Does that resolve the issue? (It did.) Can you suggest a better wording, to make this more clear in the documentation? BTW, --with-cuda-driver-{include,lib} are apparently not documented in gcc/doc/ (--with-cuda-driver neither, but can't use that, as lib is /usr/local/cuda-6.5/lib64 in my case), and isn't documented on wiki/Offloading either. Thanks for reporting; will fix that. ../configure --target=nvptx-none --enable-as-accelerator-for=x86_64-pc-linux-gnu --with-build-time-tools=/usr/src/gcc/objnvptxinst/usr/local/nvptx-none/bin --disable-sjlj-exceptions --enable-newlib-io-long-long make; make DESTDIR=/usr/src/gcc/objnvptxinst install and ../configure --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu --enable-offload-targets=nvptx-none=/usr/src/gcc/objnvptxinst --disable-bootstrap --with-cuda-driver-include=/usr/local/cuda-6.5/include --with-cuda-driver-lib=/usr/local/cuda-6.5/lib64 make; make DESTDIR=/usr/src/gcc/objnvptxinst install compilers now build That looks very similar to what I'm using. I currently install into separate prefixes/DESTDIRS, because I have not yet verified that there is no overlap in the installed files. offloading fails: /usr/src/gcc/objnvptxinst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/5.0.0//accel/nvptx-none/mkoffload @/tmp/cce9PdmR x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: language lto not recognized x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: language lto not recognized mkoffload: fatal error: /usr/src/gcc/objnvptxinst/usr/local/bin/x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 exit status compilation terminated. lto-wrapper: fatal error: /usr/src/gcc/objnvptxinst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/5.0.0//accel/nvptx-none/mkoffload returned 1 exit status compilation terminated. /usr/bin/ld: lto-wrapper failed collect2: error: ld returned 1 exit status Is --enable-languages=c,c++,fortran,lto required when configuring the offload compiler? It isn't required for intelmic. Yes, exactly. I assume the reason is that x86_64-intelmicemul-linux-gnu defaults to supporting LTO, and due to this also defaults to building the LTO front end. I'll enhance the nvptx offloading documentation accordingly. Maybe we should add some magic to build the LTO front end if --enable-as-accelerator-for=[...] has been specified? Note that I recently added another prerequisite patch for nvptx offloading to https://gcc.gnu.org/wiki/Offloading#nvptx_Offloading: http://news.gmane.org/find-root.php?message_id=%3C546CF508.9010807%40codesourcery.com%3E. If that is not applied, you'll get run-time errors because in libgomp/plugin/plugin-nvptx.c:GOMP_OFFLOAD_get_table, cuModuleGetFunction can't find main$_omp_fn$0 and similar symbols. Grüße, Thomas pgpfvZAJm6VWf.pgp Description: PGP signature
Re: nvptx offloading patches [3/n], RFD
On Wed, Feb 18, 2015 at 01:09:53PM +0100, Thomas Schwinge wrote: On Wed, 18 Feb 2015 12:34:38 +0100, Jakub Jelinek ja...@redhat.com wrote: On Wed, Feb 18, 2015 at 10:12:19AM +0100, Thomas Schwinge wrote: Do you literally have »nvptx-newlib symlinked into the gcc tree as newlib«? If yes, then that should explain the problem: as I wrote in http://news.gmane.org/find-root.php?message_id=%3C87egq8mir1.fsf%40schwinge.name%3E, you need to »add a symbolic link to nvptx-newlib's newlib directory to the directory containing the GCC sources«, so not link [GCC]/newlib - [newlib-nvptx], but [GCC]/newlib - [newlib-nvptx]/newlib. Does that resolve the issue? (It did.) Can you suggest a better wording, to make this more clear in the documentation? Your wording is fine, but should be listed on wiki/Offloading and doc/install.texi perhaps too? offloading fails: /usr/src/gcc/objnvptxinst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/5.0.0//accel/nvptx-none/mkoffload @/tmp/cce9PdmR x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: language lto not recognized x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: language lto not recognized mkoffload: fatal error: /usr/src/gcc/objnvptxinst/usr/local/bin/x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 exit status compilation terminated. lto-wrapper: fatal error: /usr/src/gcc/objnvptxinst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/5.0.0//accel/nvptx-none/mkoffload returned 1 exit status compilation terminated. /usr/bin/ld: lto-wrapper failed collect2: error: ld returned 1 exit status Is --enable-languages=c,c++,fortran,lto required when configuring the offload compiler? It isn't required for intelmic. Yes, exactly. I assume the reason is that x86_64-intelmicemul-linux-gnu defaults to supporting LTO, and due to this also defaults to building the LTO front end. I'll enhance the nvptx offloading documentation accordingly. Maybe we should add some magic to build the LTO front end if --enable-as-accelerator-for=[...] has been specified? Toplevel configure.ac has: # If LTO is enabled, add the LTO front end. if test $enable_lto = yes ; then case ,${enable_languages}, in *,lto,*) ;; *) enable_languages=${enable_languages},lto ;; esac if test ${build_lto_plugin} = yes ; then configdirs=$configdirs lto-plugin fi fi so IMHO we want similar snippet for the --enable-as-accelerator-for= case, perhaps right below this one. Not building lto FE for the accelerator compilers make them completely useless, thus I think we really want to do that automatically. Note that I recently added another prerequisite patch for nvptx offloading to https://gcc.gnu.org/wiki/Offloading#nvptx_Offloading: http://news.gmane.org/find-root.php?message_id=%3C546CF508.9010807%40codesourcery.com%3E. If that is not applied, you'll get run-time errors because in libgomp/plugin/plugin-nvptx.c:GOMP_OFFLOAD_get_table, cuModuleGetFunction can't find main$_omp_fn$0 and similar symbols. Can you adjust that to add a cgraph flag alongside of the offloadable instead and use that instead of the attribute? Jakub
Re: nvptx offloading patches [3/n], RFD
On Tue, Feb 17, 2015 at 04:21:06PM +, Joseph Myers wrote: On Tue, 17 Feb 2015, Jakub Jelinek wrote: Third attempt failed with: ../../../libgcc/config/nvptx/realloc.c:24:20: fatal error: stdlib.h: No such file or directory compilation terminated. ../../../libgcc/static-object.mk:17: recipe for target 'realloc.o' failed make[2]: *** [realloc.o] Error 1 make[2]: *** Waiting for unfinished jobs make[2]: Leaving directory '/usr/src/gcc/objnvptx/nvptx-none/libgcc' I have nvptx-newlib symlinked into the gcc tree as newlib, so I expected it would be built in-tree, is that not the case (at least wiki/Offloading mentions that). Or is it just that libgcc can't really have dependencies on newlib headers as newlib is built after libgcc? I've committed this patch to fix this last issue (the header dependence, that is; I don't know about the in-tree build). Thanks, sure, libgcc now builds fine, the in-tree build fails: configure:4261: checking for C compiler default output file name configure:4283: /usr/src/gcc/objnvptx/./gcc/xgcc -B/usr/src/gcc/objnvptx/./gcc/ -nostdinc -B/usr/src/gcc/objnvptx/nvptx-none/newlib/ -isystem /usr/src/gcc/objnvptx/nvptx-none/newlib/targ-include -isystem /usr/src/gcc/newlib/libc/include -B/usr/local/nvptx-none/bin/ -B/usr/local/nvptx-none/lib/ -isystem /usr/local/nvptx-none/include -isystem /usr/local/nvptx-none/sys-include-g -O2 conftest.c 5 error opening libc.a collect2: error: ld returned 1 exit status very early during in-tree newlib configure. Jakub
Re: nvptx offloading patches [3/n], RFD
On Tue, 17 Feb 2015, Jakub Jelinek wrote: Third attempt failed with: ../../../libgcc/config/nvptx/realloc.c:24:20: fatal error: stdlib.h: No such file or directory compilation terminated. ../../../libgcc/static-object.mk:17: recipe for target 'realloc.o' failed make[2]: *** [realloc.o] Error 1 make[2]: *** Waiting for unfinished jobs make[2]: Leaving directory '/usr/src/gcc/objnvptx/nvptx-none/libgcc' I have nvptx-newlib symlinked into the gcc tree as newlib, so I expected it would be built in-tree, is that not the case (at least wiki/Offloading mentions that). Or is it just that libgcc can't really have dependencies on newlib headers as newlib is built after libgcc? I've committed this patch to fix this last issue (the header dependence, that is; I don't know about the in-tree build). 2015-02-17 Joseph Myers jos...@codesourcery.com * config/nvptx/realloc.c: Include stddef.h instead of stdlib.h and string.h. (__nvptx_realloc): Call __builtin_memcpy instead of memcpy. Index: libgcc/config/nvptx/realloc.c === --- libgcc/config/nvptx/realloc.c (revision 220763) +++ libgcc/config/nvptx/realloc.c (working copy) @@ -21,8 +21,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see http://www.gnu.org/licenses/. */ -#include stdlib.h -#include string.h +#include stddef.h #include nvptx-malloc.h void * @@ -44,7 +43,7 @@ __nvptx_realloc (void *ptr, size_t newsz) oldsz = *sp; } if (oldsz != 0) -memcpy (newptr, ptr, oldsz newsz ? newsz : oldsz); +__builtin_memcpy (newptr, ptr, oldsz newsz ? newsz : oldsz); __nvptx_free (ptr); return newptr; -- Joseph S. Myers jos...@codesourcery.com
Re: nvptx offloading patches [3/n], RFD
On Tue, Feb 17, 2015 at 04:32:06PM +0300, Ilya Verbin wrote: If we don't try to write .gnu.offload_lto_* again, I think following patch with additionally not calling lto_write_mode_table for !lto_stream_offload_p and not calling lto_input_mode_table for !ACCEL_COMPILER - instead build a single shared identity table - might actually work. Thoughts on this? Probably the ACCEL_COMPILER in WPA mode (flag_wpa) can read .gnu.offload_lto_* sections and produce temporary partitions with .gnu.lto_* sections. And the ACCEL_COMPILER in LTRANS mode (flag_ltrans) will read .gnu.lto_* sections? FYI, I have tested my mode_table patch with the intelmic emul offloading and saw no regressions. Then I went over and wanted to try nvptx offloading, but running into various issues. I had two patches from Bernd (already approved, why they haven't been installed?) applied, had to tweak the first one so that it applies, then my mode_table patch. I've built nvptx-tools and configured: ../configure --target=nvptx-none --enable-as-accelerator-for=x86_64-pc-linux-gnu --with-build-time-tools=/usr/src/gcc/objnvptxinst/usr/local/nvptx-none/bin --disable-sjlj-exceptions --enable-newlib-io-long-long make -j16 This failed miserably, because of missing mkoffload.o dependencies, patch attached (ok for trunk?; it does what intelmic mkoffload.o does; I've tried to add | $(generated_files) dependency instead, but that somehow didn't work). The second attempt with that fixed died because for some reason nvptx-none-as wants to verify by default using ptxas. Can that be made configurable? E.g. for building nvptx offloading in distros, I believe due to the proprietary cuda stuff it will be better if everything can be built without the proprietary stuff and only used when actually running it. E.g. could the verification be done by default only if ptxas is found in $PATH and not if it isn't found? Third attempt failed with: ../../../libgcc/config/nvptx/realloc.c:24:20: fatal error: stdlib.h: No such file or directory compilation terminated. ../../../libgcc/static-object.mk:17: recipe for target 'realloc.o' failed make[2]: *** [realloc.o] Error 1 make[2]: *** Waiting for unfinished jobs make[2]: Leaving directory '/usr/src/gcc/objnvptx/nvptx-none/libgcc' I have nvptx-newlib symlinked into the gcc tree as newlib, so I expected it would be built in-tree, is that not the case (at least wiki/Offloading mentions that). Or is it just that libgcc can't really have dependencies on newlib headers as newlib is built after libgcc? Jakub * cgraph.h (clone_function_name_1): Declare. * cgraphclones.c (clone_function_name_1): New function. (clone_function_name): Use it. * lto-partition.c: Include stringpool.h. (must_not_rename, maybe_rewrite_identifier, validize_symbol_for_target): New static functions. (privatize_symbol_name): Use must_not_rename. (promote_symbol): Call validize_symbol_for_target. (lto_promote_cross_file_statics): Likewise. (lto_promote_statics_nonwpa): Likewise. --- gcc/cgraph.h.jj 2015-02-16 11:19:03.474984223 +0100 +++ gcc/cgraph.h2015-02-17 13:54:00.413964133 +0100 @@ -2206,6 +2206,7 @@ basic_block init_lowered_empty_function /* In cgraphclones.c */ +tree clone_function_name_1 (const char *, const char *); tree clone_function_name (tree decl, const char *); void tree_function_versioning (tree, tree, vecipa_replace_map *, va_gc *, --- gcc/cgraphclones.c.jj 2015-02-17 10:07:53.208582797 +0100 +++ gcc/cgraphclones.c 2015-02-17 13:54:00.413964133 +0100 @@ -533,19 +533,19 @@ cgraph_node::create_clone (tree decl, gc return new_node; } -/* Return a new assembler name for a clone of DECL with SUFFIX. */ - static GTY(()) unsigned int clone_fn_id_num; +/* Return a new assembler name for a clone with SUFFIX of a decl named + NAME. */ + tree -clone_function_name (tree decl, const char *suffix) +clone_function_name_1 (const char *name, const char *suffix) { - tree name = DECL_ASSEMBLER_NAME (decl); - size_t len = IDENTIFIER_LENGTH (name); + size_t len = strlen (name); char *tmp_name, *prefix; prefix = XALLOCAVEC (char, len + strlen (suffix) + 2); - memcpy (prefix, IDENTIFIER_POINTER (name), len); + memcpy (prefix, name, len); strcpy (prefix + len + 1, suffix); #ifndef NO_DOT_IN_LABEL prefix[len] = '.'; @@ -558,6 +558,16 @@ clone_function_name (tree decl, const ch return get_identifier (tmp_name); } +/* Return a new assembler name for a clone of DECL with SUFFIX. */ + +tree +clone_function_name (tree decl, const char *suffix) +{ + tree name = DECL_ASSEMBLER_NAME (decl); + return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix); +} + + /* Create callgraph node clone with new declaration. The actual body will be copied later at compilation stage. --- gcc/lto/lto-partition.c.jj 2015-01-15 14:05:08.706092596 +0100 +++ gcc/lto/lto-partition.c
Re: nvptx offloading patches [3/n], RFD
On Mon, Feb 16, 2015 at 22:08:12 +0100, Jakub Jelinek wrote: Anyway, the question is if for offloading we use wpa stage at all these days or not at all, if there is a way for ACCEL_COMPILER to differentiate somehow between LTO sections written by the host compiler and LTO sections perhaps created by the offloading compiler when trying to LTO the thing (if it does it at all). Because obviously the host compiler written LTO (in .gnu.offload_lto_*) would need the machine modes translated, while LTO streamed already by the ACCEL_COMPILER (if any) generally would already use the offloading target machine modes and therefore should be treated as native lto (.gnu.lto_*). Currently both intelmic and nvptx offloading compilers are executed in non-partitioned LTO mode. I don't know whether we need to support WHOPR (WPA+LTRANS) mode. Maybe it would be useful for programs with large number of target regions? But I think this is not needed for GCC 5. If we don't try to write .gnu.offload_lto_* again, I think following patch with additionally not calling lto_write_mode_table for !lto_stream_offload_p and not calling lto_input_mode_table for !ACCEL_COMPILER - instead build a single shared identity table - might actually work. Thoughts on this? Probably the ACCEL_COMPILER in WPA mode (flag_wpa) can read .gnu.offload_lto_* sections and produce temporary partitions with .gnu.lto_* sections. And the ACCEL_COMPILER in LTRANS mode (flag_ltrans) will read .gnu.lto_* sections? -- Ilya
Re: nvptx offloading patches [3/n], RFD
On Mon, Feb 16, 2015 at 10:43 PM, Jakub Jelinek ja...@redhat.com wrote: On Mon, Feb 16, 2015 at 10:35:30PM +0100, Richard Biener wrote: Seeing the real format string you introduce I wonder if identifying modes by their names wouldn't work in 99% of all cases (apart from PSImode maybe). There are various corner cases. Plus of course sometimes insignificant, but sometimes very significant, floating mode changes. SFmode on one target might be completely different from another target. But we can't deal with arbitrary target differences anyway - otherwise we have generated wrong code already. Also for most cases we can construct the machine mode from the type. Or where that is not possible stream the extra info that is necessary instead. I thought we've discussed that already on IRC. E.g. decimal modes are identified only by mode and nothing else, and it doesn't look like it can be easily derived from types in many cases (spent quite some time on that). Sure, still modes and types have quite some overlap in information so we might be able to do more compact streaming (and at the same time not rely on the machine-mode enum). The machine-modes of course are very compact to stream (they are basically a common set of all possible types), and your mapping introduces kind of a cache for common type properties. I know that Honza wanted to make trees slimmer by taking into account more (redundant) information from the modes associated with trees. I'm just looking for a way to make this less of a hack (and the LTO IL less target dependent). Not for GCC 5 for which something like your patch is probably ok, but for the future. Overall feels like a hack BTW :) can't we assign machine mode enum IDs in a target independent way? I mean, it doesn't have to be densely allocated? We iterate over modes, we have tons of tables indexed by modes, so if we introduce gaps, we'll make the compiler bigger and slower. If this is limited to the offloading path, like in the attached updated patch, the overhead for native LTO should be not measurable. Sure. Thanks, Richard. --- gcc/passes.c.jj 2015-02-16 22:18:33.219702315 +0100 +++ gcc/passes.c2015-02-16 22:19:20.842917807 +0100 @@ -2460,6 +2460,7 @@ ipa_write_summaries_1 (lto_symtab_encode struct lto_out_decl_state *state = lto_new_out_decl_state (); state-symtab_node_encoder = encoder; + lto_output_init_mode_table (); lto_push_out_decl_state (state); gcc_assert (!flag_wpa); @@ -2581,6 +2582,7 @@ ipa_write_optimization_summaries (lto_sy lto_symtab_encoder_iterator lsei; state-symtab_node_encoder = encoder; + lto_output_init_mode_table (); lto_push_out_decl_state (state); for (lsei = lsei_start_function_in_partition (encoder); !lsei_end_p (lsei); lsei_next_function_in_partition (lsei)) --- gcc/tree-streamer.h.jj 2015-02-16 22:18:33.222702266 +0100 +++ gcc/tree-streamer.h 2015-02-16 22:19:20.843917791 +0100 @@ -24,6 +24,7 @@ along with GCC; see the file COPYING3. #include streamer-hooks.h #include lto-streamer.h +#include data-streamer.h #include hash-map.h /* Cache of pickled nodes. Used to avoid writing the same node more @@ -91,6 +92,7 @@ void streamer_write_integer_cst (struct void streamer_write_builtin (struct output_block *, tree); /* In tree-streamer.c. */ +extern unsigned char streamer_mode_table[1 8]; void streamer_check_handled_ts_structures (void); bool streamer_tree_cache_insert (struct streamer_tree_cache_d *, tree, hashval_t, unsigned *); @@ -119,5 +121,19 @@ streamer_tree_cache_get_hash (struct str return cache-hashes[ix]; } +static inline void +bp_pack_machine_mode (struct bitpack_d *bp, machine_mode mode) +{ + streamer_mode_table[mode] = 1; + bp_pack_enum (bp, machine_mode, 1 8, mode); +} + +static inline machine_mode +bp_unpack_machine_mode (struct bitpack_d *bp) +{ + return (machine_mode) + ((struct lto_input_block *) + bp-stream)-mode_table[bp_unpack_enum (bp, machine_mode, 1 8)]; +} #endif /* GCC_TREE_STREAMER_H */ --- gcc/lto-streamer-out.c.jj 2015-02-16 22:18:33.204702562 +0100 +++ gcc/lto-streamer-out.c 2015-02-16 22:20:06.659163066 +0100 @@ -2642,6 +2642,96 @@ produce_symtab (struct output_block *ob) } +/* Init the streamer_mode_table for output, where we collect info on what + machine_mode values have been streamed. */ +void +lto_output_init_mode_table (void) +{ + memset (streamer_mode_table, '\0', MAX_MACHINE_MODE); +} + + +/* Write the mode table. */ +static void +lto_write_mode_table (void) +{ + struct output_block *ob; + ob = create_output_block (LTO_section_mode_table); + bitpack_d bp = bitpack_create (ob-main_stream); + + /* Ensure that for GET_MODE_INNER (m) != VOIDmode we have + also the inner mode marked. */ + for (int i = 0; i (int) MAX_MACHINE_MODE; i++) +if
Re: nvptx offloading patches [3/n], RFD
On February 16, 2015 10:08:12 PM CET, Jakub Jelinek ja...@redhat.com wrote: Hi! On Mon, Feb 09, 2015 at 11:20:00AM +0100, Richard Biener wrote: I think (also communicated that on IRC) we should instead try not streaming machine-modes at all but generating them at stream-in time via layout_type or layout_decl. Here is a WIP prototype for being able to stream a machine mode description table and streaming it back in. In the end, I'd like to stream this out only for lto_stream_offload_p and stream it in only for ACCEL_COMPILER reading in when available, but wanted to see what it does even for native LTO. For that it doesn't work very well, because it seems that wpa phase doesn't stream in some sections and stream them out again, but instead somehow copies them directly to the output object, so the mode table isn't aware of the modes used in there that were bypassed this way. Anyway, the question is if for offloading we use wpa stage at all these days or not at all, if there is a way for ACCEL_COMPILER to differentiate somehow between LTO sections written by the host compiler and LTO sections perhaps created by the offloading compiler when trying to LTO the thing (if it does it at all). Because obviously the host compiler written LTO (in .gnu.offload_lto_*) would need the machine modes translated, while LTO streamed already by the ACCEL_COMPILER (if any) generally would already use the offloading target machine modes and therefore should be treated as native lto (.gnu.lto_*). If we don't try to write .gnu.offload_lto_* again, I think following patch with additionally not calling lto_write_mode_table for !lto_stream_offload_p and not calling lto_input_mode_table for !ACCEL_COMPILER - instead build a single shared identity table - might actually work. Thoughts on this? Seeing the real format string you introduce I wonder if identifying modes by their names wouldn't work in 99% of all cases (apart from PSImode maybe). Also for most cases we can construct the machine mode from the type. Or where that is not possible stream the extra info that is necessary instead. Overall feels like a hack BTW :) can't we assign machine mode enum IDs in a target independent way? I mean, it doesn't have to be densely allocated? Richard. Bernd/Thomas, do you plan to commit the other approved patches soon? --- gcc/passes.c.jj2015-02-16 20:14:09.477345693 +0100 +++ gcc/passes.c 2015-02-16 20:26:23.659299189 +0100 @@ -2460,6 +2460,7 @@ ipa_write_summaries_1 (lto_symtab_encode struct lto_out_decl_state *state = lto_new_out_decl_state (); state-symtab_node_encoder = encoder; + lto_output_init_mode_table (); lto_push_out_decl_state (state); gcc_assert (!flag_wpa); @@ -2581,6 +2582,7 @@ ipa_write_optimization_summaries (lto_sy lto_symtab_encoder_iterator lsei; state-symtab_node_encoder = encoder; + lto_output_init_mode_table (); lto_push_out_decl_state (state); for (lsei = lsei_start_function_in_partition (encoder); !lsei_end_p (lsei); lsei_next_function_in_partition (lsei)) --- gcc/tree-streamer.h.jj 2015-02-16 20:14:09.446346202 +0100 +++ gcc/tree-streamer.h2015-02-16 21:14:50.701615850 +0100 @@ -24,6 +24,7 @@ along with GCC; see the file COPYING3. #include streamer-hooks.h #include lto-streamer.h +#include data-streamer.h #include hash-map.h /* Cache of pickled nodes. Used to avoid writing the same node more @@ -91,6 +92,7 @@ void streamer_write_integer_cst (struct void streamer_write_builtin (struct output_block *, tree); /* In tree-streamer.c. */ +extern unsigned char streamer_mode_table[1 8]; void streamer_check_handled_ts_structures (void); bool streamer_tree_cache_insert (struct streamer_tree_cache_d *, tree, hashval_t, unsigned *); @@ -119,5 +121,19 @@ streamer_tree_cache_get_hash (struct str return cache-hashes[ix]; } +static inline void +bp_pack_machine_mode (struct bitpack_d *bp, machine_mode mode) +{ + streamer_mode_table[mode] = 1; + bp_pack_enum (bp, machine_mode, 1 8, mode); +} + +static inline machine_mode +bp_unpack_machine_mode (struct bitpack_d *bp) +{ + return (machine_mode) + ((struct lto_input_block *) + bp-stream)-mode_table[bp_unpack_enum (bp, machine_mode, 1 8)]; +} #endif /* GCC_TREE_STREAMER_H */ --- gcc/lto-streamer-out.c.jj 2015-02-16 20:14:09.046352765 +0100 +++ gcc/lto-streamer-out.c 2015-02-16 20:26:23.665299091 +0100 @@ -2642,6 +2642,96 @@ produce_symtab (struct output_block *ob) } +/* Init the streamer_mode_table for output, where we collect info on what + machine_mode values have been streamed. */ +void +lto_output_init_mode_table (void) +{ + memset (streamer_mode_table, '\0', MAX_MACHINE_MODE); +} + + +/* Write the mode table. */ +static void +lto_write_mode_table (void) +{ + struct output_block *ob; + ob = create_output_block (LTO_section_mode_table); + bitpack_d bp = bitpack_create (ob-main_stream); + + /* Ensure that
Re: nvptx offloading patches [3/n], RFD
On Mon, Feb 16, 2015 at 10:35:30PM +0100, Richard Biener wrote: Seeing the real format string you introduce I wonder if identifying modes by their names wouldn't work in 99% of all cases (apart from PSImode maybe). There are various corner cases. Plus of course sometimes insignificant, but sometimes very significant, floating mode changes. SFmode on one target might be completely different from another target. Also for most cases we can construct the machine mode from the type. Or where that is not possible stream the extra info that is necessary instead. I thought we've discussed that already on IRC. E.g. decimal modes are identified only by mode and nothing else, and it doesn't look like it can be easily derived from types in many cases (spent quite some time on that). Overall feels like a hack BTW :) can't we assign machine mode enum IDs in a target independent way? I mean, it doesn't have to be densely allocated? We iterate over modes, we have tons of tables indexed by modes, so if we introduce gaps, we'll make the compiler bigger and slower. If this is limited to the offloading path, like in the attached updated patch, the overhead for native LTO should be not measurable. --- gcc/passes.c.jj 2015-02-16 22:18:33.219702315 +0100 +++ gcc/passes.c2015-02-16 22:19:20.842917807 +0100 @@ -2460,6 +2460,7 @@ ipa_write_summaries_1 (lto_symtab_encode struct lto_out_decl_state *state = lto_new_out_decl_state (); state-symtab_node_encoder = encoder; + lto_output_init_mode_table (); lto_push_out_decl_state (state); gcc_assert (!flag_wpa); @@ -2581,6 +2582,7 @@ ipa_write_optimization_summaries (lto_sy lto_symtab_encoder_iterator lsei; state-symtab_node_encoder = encoder; + lto_output_init_mode_table (); lto_push_out_decl_state (state); for (lsei = lsei_start_function_in_partition (encoder); !lsei_end_p (lsei); lsei_next_function_in_partition (lsei)) --- gcc/tree-streamer.h.jj 2015-02-16 22:18:33.222702266 +0100 +++ gcc/tree-streamer.h 2015-02-16 22:19:20.843917791 +0100 @@ -24,6 +24,7 @@ along with GCC; see the file COPYING3. #include streamer-hooks.h #include lto-streamer.h +#include data-streamer.h #include hash-map.h /* Cache of pickled nodes. Used to avoid writing the same node more @@ -91,6 +92,7 @@ void streamer_write_integer_cst (struct void streamer_write_builtin (struct output_block *, tree); /* In tree-streamer.c. */ +extern unsigned char streamer_mode_table[1 8]; void streamer_check_handled_ts_structures (void); bool streamer_tree_cache_insert (struct streamer_tree_cache_d *, tree, hashval_t, unsigned *); @@ -119,5 +121,19 @@ streamer_tree_cache_get_hash (struct str return cache-hashes[ix]; } +static inline void +bp_pack_machine_mode (struct bitpack_d *bp, machine_mode mode) +{ + streamer_mode_table[mode] = 1; + bp_pack_enum (bp, machine_mode, 1 8, mode); +} + +static inline machine_mode +bp_unpack_machine_mode (struct bitpack_d *bp) +{ + return (machine_mode) + ((struct lto_input_block *) + bp-stream)-mode_table[bp_unpack_enum (bp, machine_mode, 1 8)]; +} #endif /* GCC_TREE_STREAMER_H */ --- gcc/lto-streamer-out.c.jj 2015-02-16 22:18:33.204702562 +0100 +++ gcc/lto-streamer-out.c 2015-02-16 22:20:06.659163066 +0100 @@ -2642,6 +2642,96 @@ produce_symtab (struct output_block *ob) } +/* Init the streamer_mode_table for output, where we collect info on what + machine_mode values have been streamed. */ +void +lto_output_init_mode_table (void) +{ + memset (streamer_mode_table, '\0', MAX_MACHINE_MODE); +} + + +/* Write the mode table. */ +static void +lto_write_mode_table (void) +{ + struct output_block *ob; + ob = create_output_block (LTO_section_mode_table); + bitpack_d bp = bitpack_create (ob-main_stream); + + /* Ensure that for GET_MODE_INNER (m) != VOIDmode we have + also the inner mode marked. */ + for (int i = 0; i (int) MAX_MACHINE_MODE; i++) +if (streamer_mode_table[i]) + { + machine_mode m = (machine_mode) i; + if (GET_MODE_INNER (m) != VOIDmode) + streamer_mode_table[(int) GET_MODE_INNER (m)] = 1; + } + /* First stream modes that have GET_MODE_INNER (m) == VOIDmode, + so that we can refer to them afterwards. */ + for (int pass = 0; pass 2; pass++) +for (int i = 0; i (int) MAX_MACHINE_MODE; i++) + if (streamer_mode_table[i] i != (int) VOIDmode i != (int) BLKmode) + { + machine_mode m = (machine_mode) i; + if ((GET_MODE_INNER (m) == VOIDmode) ^ (pass == 0)) + continue; + bp_pack_value (bp, m, 8); + bp_pack_enum (bp, mode_class, MAX_MODE_CLASS, GET_MODE_CLASS (m)); + bp_pack_value (bp, GET_MODE_SIZE (m), 8); + bp_pack_value (bp, GET_MODE_PRECISION (m), 16); + bp_pack_value (bp, GET_MODE_INNER (m), 8); + bp_pack_value (bp, GET_MODE_NUNITS (m), 8); + switch (GET_MODE_CLASS (m))
Re: nvptx offloading patches [3/n], RFD
Hi! On Mon, Feb 09, 2015 at 11:20:00AM +0100, Richard Biener wrote: I think (also communicated that on IRC) we should instead try not streaming machine-modes at all but generating them at stream-in time via layout_type or layout_decl. Here is a WIP prototype for being able to stream a machine mode description table and streaming it back in. In the end, I'd like to stream this out only for lto_stream_offload_p and stream it in only for ACCEL_COMPILER reading in when available, but wanted to see what it does even for native LTO. For that it doesn't work very well, because it seems that wpa phase doesn't stream in some sections and stream them out again, but instead somehow copies them directly to the output object, so the mode table isn't aware of the modes used in there that were bypassed this way. Anyway, the question is if for offloading we use wpa stage at all these days or not at all, if there is a way for ACCEL_COMPILER to differentiate somehow between LTO sections written by the host compiler and LTO sections perhaps created by the offloading compiler when trying to LTO the thing (if it does it at all). Because obviously the host compiler written LTO (in .gnu.offload_lto_*) would need the machine modes translated, while LTO streamed already by the ACCEL_COMPILER (if any) generally would already use the offloading target machine modes and therefore should be treated as native lto (.gnu.lto_*). If we don't try to write .gnu.offload_lto_* again, I think following patch with additionally not calling lto_write_mode_table for !lto_stream_offload_p and not calling lto_input_mode_table for !ACCEL_COMPILER - instead build a single shared identity table - might actually work. Thoughts on this? Bernd/Thomas, do you plan to commit the other approved patches soon? --- gcc/passes.c.jj 2015-02-16 20:14:09.477345693 +0100 +++ gcc/passes.c2015-02-16 20:26:23.659299189 +0100 @@ -2460,6 +2460,7 @@ ipa_write_summaries_1 (lto_symtab_encode struct lto_out_decl_state *state = lto_new_out_decl_state (); state-symtab_node_encoder = encoder; + lto_output_init_mode_table (); lto_push_out_decl_state (state); gcc_assert (!flag_wpa); @@ -2581,6 +2582,7 @@ ipa_write_optimization_summaries (lto_sy lto_symtab_encoder_iterator lsei; state-symtab_node_encoder = encoder; + lto_output_init_mode_table (); lto_push_out_decl_state (state); for (lsei = lsei_start_function_in_partition (encoder); !lsei_end_p (lsei); lsei_next_function_in_partition (lsei)) --- gcc/tree-streamer.h.jj 2015-02-16 20:14:09.446346202 +0100 +++ gcc/tree-streamer.h 2015-02-16 21:14:50.701615850 +0100 @@ -24,6 +24,7 @@ along with GCC; see the file COPYING3. #include streamer-hooks.h #include lto-streamer.h +#include data-streamer.h #include hash-map.h /* Cache of pickled nodes. Used to avoid writing the same node more @@ -91,6 +92,7 @@ void streamer_write_integer_cst (struct void streamer_write_builtin (struct output_block *, tree); /* In tree-streamer.c. */ +extern unsigned char streamer_mode_table[1 8]; void streamer_check_handled_ts_structures (void); bool streamer_tree_cache_insert (struct streamer_tree_cache_d *, tree, hashval_t, unsigned *); @@ -119,5 +121,19 @@ streamer_tree_cache_get_hash (struct str return cache-hashes[ix]; } +static inline void +bp_pack_machine_mode (struct bitpack_d *bp, machine_mode mode) +{ + streamer_mode_table[mode] = 1; + bp_pack_enum (bp, machine_mode, 1 8, mode); +} + +static inline machine_mode +bp_unpack_machine_mode (struct bitpack_d *bp) +{ + return (machine_mode) + ((struct lto_input_block *) + bp-stream)-mode_table[bp_unpack_enum (bp, machine_mode, 1 8)]; +} #endif /* GCC_TREE_STREAMER_H */ --- gcc/lto-streamer-out.c.jj 2015-02-16 20:14:09.046352765 +0100 +++ gcc/lto-streamer-out.c 2015-02-16 20:26:23.665299091 +0100 @@ -2642,6 +2642,96 @@ produce_symtab (struct output_block *ob) } +/* Init the streamer_mode_table for output, where we collect info on what + machine_mode values have been streamed. */ +void +lto_output_init_mode_table (void) +{ + memset (streamer_mode_table, '\0', MAX_MACHINE_MODE); +} + + +/* Write the mode table. */ +static void +lto_write_mode_table (void) +{ + struct output_block *ob; + ob = create_output_block (LTO_section_mode_table); + bitpack_d bp = bitpack_create (ob-main_stream); + + /* Ensure that for GET_MODE_INNER (m) != VOIDmode we have + also the inner mode marked. */ + for (int i = 0; i (int) MAX_MACHINE_MODE; i++) +if (streamer_mode_table[i]) + { + machine_mode m = (machine_mode) i; + if (GET_MODE_INNER (m) != VOIDmode) + streamer_mode_table[(int) GET_MODE_INNER (m)] = 1; + } + /* First stream modes that have GET_MODE_INNER (m) == VOIDmode, + so that we can refer to them afterwards. */ + for (int pass = 0; pass 2; pass++) +for (int i = 0; i (int) MAX_MACHINE_MODE; i++) +
Re: nvptx offloading patches [3/n], RFD
On Wed, Feb 4, 2015 at 12:38 PM, Jakub Jelinek ja...@redhat.com wrote: On Sat, Nov 01, 2014 at 12:57:45PM +0100, Bernd Schmidt wrote: This is not against current trunk; it applies to gomp-4_0-branch where it is one of the necessary parts to make offloading x86-nvptx work. The issue is that the LTO file format depends on the machine_modes enum, it needs to match between host and offload target. The easiest way to do this is to just use the host-modes.def when compiling an offload compiler. Ports that want to be hosts for offloading may need to modify their modes.def. The patch below contains changes to i386-modes.def which modifies XFmode depending on a target switch. I'm not actually entirely sure what to do about this. Do we want to make this flag an error when offloading is enabled? Or maybe add float format support to the -foffload-abi option? Thoughts? Ok for the first part of the patch once the other offloading patches have gone in (bootstrapped and tested on x86_64-linux)? I don't like this at all. IMHO instead we should stream in the offloading LTO sections some kind of mode description table (perhaps limited to the modes actually ever streamed), and when reading back the offloading LTO sections, let the offloading compiler remap the modes to its own modes where there is a mapping in between the two, choose some other mapping (e.g. map various vector modes the host has but offloading target does not to say BLKmode), or give up otherwise with offloading (say if you attempt to stream floating point modes the offloading target doesn't support etc.). So perhaps stream for each used mode the mode value, corresponding mode class, size, precision, inner mode, nunits, and for floating point modes supposedly somehow encode the real_format (perhaps just add a name - struct real_format mapping for the real.c modes, and map anything else to unknown). I think (also communicated that on IRC) we should instead try not streaming machine-modes at all but generating them at stream-in time via layout_type or layout_decl. Richard. Jakub
Re: nvptx offloading patches [3/n], RFD
On Sat, Nov 01, 2014 at 12:57:45PM +0100, Bernd Schmidt wrote: This is not against current trunk; it applies to gomp-4_0-branch where it is one of the necessary parts to make offloading x86-nvptx work. The issue is that the LTO file format depends on the machine_modes enum, it needs to match between host and offload target. The easiest way to do this is to just use the host-modes.def when compiling an offload compiler. Ports that want to be hosts for offloading may need to modify their modes.def. The patch below contains changes to i386-modes.def which modifies XFmode depending on a target switch. I'm not actually entirely sure what to do about this. Do we want to make this flag an error when offloading is enabled? Or maybe add float format support to the -foffload-abi option? Thoughts? Ok for the first part of the patch once the other offloading patches have gone in (bootstrapped and tested on x86_64-linux)? I don't like this at all. IMHO instead we should stream in the offloading LTO sections some kind of mode description table (perhaps limited to the modes actually ever streamed), and when reading back the offloading LTO sections, let the offloading compiler remap the modes to its own modes where there is a mapping in between the two, choose some other mapping (e.g. map various vector modes the host has but offloading target does not to say BLKmode), or give up otherwise with offloading (say if you attempt to stream floating point modes the offloading target doesn't support etc.). So perhaps stream for each used mode the mode value, corresponding mode class, size, precision, inner mode, nunits, and for floating point modes supposedly somehow encode the real_format (perhaps just add a name - struct real_format mapping for the real.c modes, and map anything else to unknown). Jakub
Re: nvptx offloading patches [3/n], RFD
On 11/01/14 05:57, Bernd Schmidt wrote: This is not against current trunk; it applies to gomp-4_0-branch where it is one of the necessary parts to make offloading x86-nvptx work. The issue is that the LTO file format depends on the machine_modes enum, it needs to match between host and offload target. The easiest way to do this is to just use the host-modes.def when compiling an offload compiler. Ports that want to be hosts for offloading may need to modify their modes.def. The patch below contains changes to i386-modes.def which modifies XFmode depending on a target switch. I'm not actually entirely sure what to do about this. Do we want to make this flag an error when offloading is enabled? Or maybe add float format support to the -foffload-abi option? Thoughts? Ok for the first part of the patch once the other offloading patches have gone in (bootstrapped and tested on x86_64-linux)? It feels like we've got another real distinction to make. We've had host, build target and they're all independent. It feels like we need offload target and better separate between target and offload target. Then we need to figure out the places where we've got bleed-out. Not sure how to deal with this any further out than the immediate term than using a hack like this. Though I'd prefer to avoid the #ifdef as it seems to me this shouldn't be baked in at build/configure time. jeff
nvptx offloading patches [3/n], RFD
This is not against current trunk; it applies to gomp-4_0-branch where it is one of the necessary parts to make offloading x86-nvptx work. The issue is that the LTO file format depends on the machine_modes enum, it needs to match between host and offload target. The easiest way to do this is to just use the host-modes.def when compiling an offload compiler. Ports that want to be hosts for offloading may need to modify their modes.def. The patch below contains changes to i386-modes.def which modifies XFmode depending on a target switch. I'm not actually entirely sure what to do about this. Do we want to make this flag an error when offloading is enabled? Or maybe add float format support to the -foffload-abi option? Thoughts? Ok for the first part of the patch once the other offloading patches have gone in (bootstrapped and tested on x86_64-linux)? Bernd * config.gcc (offload_host_cpu_type): Compute. (extra_modes): Use it to pick the offload host CPU's modes.def when building an offload target. * config/i386/i386-modes.def (XF): Skip adjustments when building an offload compiler. Index: gomp-4_0-branch/gcc/config.gcc === --- gomp-4_0-branch.orig/gcc/config.gcc +++ gomp-4_0-branch/gcc/config.gcc @@ -483,15 +483,26 @@ tilepro*-*-*) ;; esac +offload_host_cpu_type=${cpu_type} +if test x${enable_as_accelerator} != xno +then + offload_host_cpu_type=`echo ${enable_as_accelerator_for} | sed 's/-.*$//'` +fi +case ${offload_host_cpu_type} in +x86_64) + offload_host_cpu_type=i386 + ;; +esac + tm_file=${cpu_type}/${cpu_type}.h if test -f ${srcdir}/config/${cpu_type}/${cpu_type}-protos.h then tm_p_file=${cpu_type}/${cpu_type}-protos.h fi extra_modes= -if test -f ${srcdir}/config/${cpu_type}/${cpu_type}-modes.def +if test -f ${srcdir}/config/${offload_host_cpu_type}/${offload_host_cpu_type}-modes.def then - extra_modes=${cpu_type}/${cpu_type}-modes.def + extra_modes=${offload_host_cpu_type}/${offload_host_cpu_type}-modes.def fi if test -f ${srcdir}/config/${cpu_type}/${cpu_type}.opt then Index: gomp-4_0-branch/gcc/config/i386/i386-modes.def === --- gomp-4_0-branch.orig/gcc/config/i386/i386-modes.def +++ gomp-4_0-branch/gcc/config/i386/i386-modes.def @@ -24,6 +24,9 @@ along with GCC; see the file COPYING3. FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_intel_96_format); FLOAT_MODE (TF, 16, ieee_quad_format); +/* This file may be used when building a compiler for an offload target. + Assume that no special floating point options are used. */ +#ifndef ACCEL_COMPILER /* In ILP32 mode, XFmode has size 12 and alignment 4. In LP64 mode, XFmode has size and alignment 16. */ ADJUST_FLOAT_FORMAT (XF, (TARGET_128BIT_LONG_DOUBLE @@ -33,6 +36,7 @@ ADJUST_FLOAT_FORMAT (XF, (TARGET_128BIT_ : ieee_extended_intel_96_format)); ADJUST_BYTESIZE (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 12); ADJUST_ALIGNMENT (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 4); +#endif /* Add any extra modes needed to represent the condition code.