[PR 69920] Prevent SRA from leaving a removed SSA_NAME in IL

2016-02-26 Thread Martin Jambor
Hi,

my fix for PR 69666 has caused quite a few regressions accross the
borad where SRA removed a SSA_NAME which however still was in the IL
(and usually stumbled upon it itself straight away).

The removal path should not be executed when there is an SSA_NAME on
the LHS, the code clearly is not ready for it.  Before my patch, we
got always lucky because the statement was simply modified elsewhere
when the LHS was an SSA_NAME.  However, even that was not 10%
guaranteed because of the !access_has_replacements_p (racc) part of
the changed condition.

The patch below fixes the ICEs simply by guarding the removal code to
only work when the LHS is not an SSA_NAME.  This means that the safe
path below it is going to execute.

I have bootstrapped and tested the patch on x86_64-linux.  I'd like to
commit it to trunk as soon as it gets approved and then I'd like to
commit it to gcc-5 branch together with the PR 69666 fix a few days
afterwards.  OK?

Thanks,

Martin


2016-02-26  Martin Jambor  

PR middle-end/69920
* tree-sra.c (sra_modify_assign): Do not remove loads of
uninitialized aggregates to SSA_NAMEs.

testsuite/
* gcc.dg/torture/pr69932.c: New test.
* gcc.dg/torture/pr69936.c: Likewise.
---
 gcc/testsuite/gcc.dg/torture/pr69932.c | 10 ++
 gcc/testsuite/gcc.dg/torture/pr69936.c | 24 
 gcc/tree-sra.c |  3 ++-
 3 files changed, 36 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr69932.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr69936.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr69932.c 
b/gcc/testsuite/gcc.dg/torture/pr69932.c
new file mode 100644
index 000..4b82130
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr69932.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+
+int a;
+void fn1() {
+  int b = 4;
+  short c[4];
+  c[b] = c[a];
+  if (c[2]) {}
+
+}
diff --git a/gcc/testsuite/gcc.dg/torture/pr69936.c 
b/gcc/testsuite/gcc.dg/torture/pr69936.c
new file mode 100644
index 000..3023bbb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr69936.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+
+int a;
+char b;
+void fn1(int p1) {}
+
+int fn2() { return 5; }
+
+void fn3() {
+  if (fn2())
+;
+  else {
+char c[5];
+c[0] = 5;
+  lbl_608:
+fn1(c[9]);
+int d = c[9];
+c[2] | a;
+d = c[b];
+  }
+  goto lbl_608;
+}
+
+int main() { return 0; }
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 663ded2..366f413 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -3504,7 +3504,8 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator 
*gsi)
   else
{
  if (access_has_children_p (racc)
- && !racc->grp_unscalarized_data)
+ && !racc->grp_unscalarized_data
+ && TREE_CODE (lhs) != SSA_NAME)
{
  if (dump_file)
{
-- 
2.7.1



Re: (Non-)offloading diagnostics

2016-02-26 Thread Martin Jambor
Hi,

On Fri, Feb 26, 2016 at 05:46:33PM +0100, Thomas Schwinge wrote:
> Hi!
> 
> In light of the -Whsa testsuite patches just posted, I think we first
> need to clarify the general policy questions I posted a month ago:
> 
> On Tue, 26 Jan 2016 11:46:14 +0100, I wrote:
> > On Thu, 10 Dec 2015 18:51:48 +0100, Martin Jambor  wrote:
> > > On Mon, Dec 07, 2015 at 12:46:45PM +0100, Jakub Jelinek wrote:
> > > > On Mon, Dec 07, 2015 at 12:17:58PM +0100, Martin Jambor wrote:
> > > > > [...]  There are no failing
> > > > > testcases if HSA is not configured.  If it is, there are some, all of
> > > > > which fall into one the following categories:
> > > > > 
> > > > >   1) HSA cannot compile a function for one reason or another (most
> > > > >  common cause is inability of HSA to take an address of a function
> > > > >  or make an indirect call) and gives a warning, which is regarded
> > > > >  as an "excess error" by dejagnu.
> > 
> > Confirmed:
> > 
> > [...]/gcc/testsuite/c-c++-common/gomp/clauses-1.c: In function 
> > 'bar._omp_fn.26.hsa.31':
> > cc1: warning: could not emit HSAIL for the function [-Whsa]
> > cc1: note: support for HSA does not implement non-gridified OpenMP 
> > parallel constructs.
> > [...]
> > 
> > ..., and many more.  So, with --enable-offload-targets=[...],hsa we
> > regress (PASS -> FAIL; "test for excess errors") such compile tests.
> > 
> > > > It would be good if there is a -W* switch to turn such warnings off.
> > > > Not just for the purposes of dejagnu libgomp testing, but say one
> > > > might try to compile a program primarily say for XeonPhi or PTX 
> > > > offloading,
> > > > but have HSA enabled to, but care primarily about the former two, etc.
> > > 
> > > All these warnings are in the -Whsa group and can be suppressed with
> > > -Wno-hsa.
> > 
> > These compile tests are done without any -W* flags; -Whsa is enabled by
> > default.
> 
> I'm a proponent of enabling as many useful warnings by default, or if not
> by default, then with -Wall.  -Whsa is enabled by default, and has thus
> set a precedent of doing that.

I am not sure I'd go as far as "as many as possible," but in the case
of -Whsa, the warnings get emitted only if HSA offloading is
configured and especially only if the user used OMP and its target
construct.  This means that it is relevant only for a rather small
class of users and it's not a "your code looks weird" kind of warning
but a "the compiler is not doing what you clearly asked for" warning.
So that is why we decided to warn unconditionally.

But as far as I understand, gcc does not give any promises about
warnings, so I believe decisions like a defaultness of a warning can
be revisited at any point in the future, for example if people learn
not to expect some constructs to be offloaded to GPUs.  Moreover, the
conventions regarding offloading are still being settled and still
will for quite some time so nobody should really expect such details
to be set in stone.

> 
> > How to address this mismatch?  Put -Wno-has into all regressing
> > test case files individually?  Run the affected testsuites with -Wno-hsa?
> > Not enable -Whsa by default (but I agree it's useful to users)?
> > (Instead, enable with -Wall, which any sane user should be specifying?)
> 
> Even if a bit tedious, my preference actually is to add to the test cases
> an (expected) dg-warning everywhere where such a non-offloading warning
> currently triggers, because that's what users will be seeing (with -Whsa
> enabled by default), and because that will make it obvious (PASS -> FAIL
> for the warning check) when that warning disappears (say, because the
> compiler can now offload the respective construct, yay).

That is my opinion as well, except that given the number of warnings
now (with dynamic parallelism disabled), I prefer to work on the file
granularity.  Also, often testcases use macros heavily and putting
dg-warning into them is somewhere between weird and outright
impossible.

On the other hand, as you have probably noticed, Jakub asked me to
pass -Wno-hsa to all tests instead so he seems to have the opposing
point of view.  I must say that I am not really ready to argue about
this too much, especially if we have our own HSA testsuite directory.

Martin

> 
> > A very similar problem also exists for nvptx offloading (Nathan CCed),
> > where we emit similar warnings (enabled by default).  As nvptx offloading
> > happens during link-time (not compile-time, as with hsa offloading),
> > these don't affect GCC's compile tests, but need to be worked around in
> > libgomp test cases.
> 
> 
> Grüße
>  Thomas


Re: (Non-)offloading diagnostics

2016-02-26 Thread Martin Jambor
Hi,

On Fri, Feb 26, 2016 at 06:51:34PM +0100, Jakub Jelinek wrote:
> On Fri, Feb 26, 2016 at 06:18:13PM +0100, Martin Jambor wrote:
> > > I'm a proponent of enabling as many useful warnings by default, or if not
> > > by default, then with -Wall.  -Whsa is enabled by default, and has thus
> > > set a precedent of doing that.
> > 
> > I am not sure I'd go as far as "as many as possible," but in the case
> > of -Whsa, the warnings get emitted only if HSA offloading is
> > configured and especially only if the user used OMP and its target
> > construct.  This means that it is relevant only for a rather small
> > class of users and it's not a "your code looks weird" kind of warning
> > but a "the compiler is not doing what you clearly asked for" warning.
> > So that is why we decided to warn unconditionally.
> > 
> > But as far as I understand, gcc does not give any promises about
> > warnings, so I believe decisions like a defaultness of a warning can
> > be revisited at any point in the future, for example if people learn
> > not to expect some constructs to be offloaded to GPUs.  Moreover, the
> > conventions regarding offloading are still being settled and still
> > will for quite some time so nobody should really expect such details
> > to be set in stone.
> 
> The thing is, most of the tests in the libgomp.{c,c++,fortran}/ testsuite
> are (meant to be) valid OpenMP testcases, having them full of dozens of
> dg-warning lines where every of the 10+ different offloading target warns
> about something would be a maintainance nightmare.

Agreed, having such dg-warnings would definitely be an overkill.  I
only intended to mark the whole test with an option. I am willing to
be looking for new hsa warnings and examine them myself, adding the
option if necessary.  I would not expect the originator of the
testcase or anybody who does not care for HSA to do it.

> E.g. when adding
> new OpenMP tests, one would need to configure all the offloaders
> (individually?), for some you need hw not every committer has,

No special hardware is necessary to see the warning (though you need
at least https://github.com/HSAFoundation/HSA-Runtime-Reference-Source
to build the libgomp plugin).  Once gcc decides to emit HSAIL, it of
course has to work as expected.  There should be no need to configure
hsa individually either.

> for others
> there are other issues (e.g., is the required amdkfd going to be submitted
> for upstream kernel?  I might have hard time convincing our kernel
> maintainers to use that instead of what is in upstream kernel others).

I am being repeatedly told it will be and very soon, but apparently it
takes longer than AMD anticipated.  I don't think anybody expects any
distribution to pick it up on their own (...but you know, Red Hat is
actually the company that now employs the upstream kernel kfd
maintainer ;-).

> So, IMHO if you want to check for warnings, do that as Martin has added a
> new subdir with only hsa OpenMP tests, if you want test warnings on tests we
> already have elsewhere, #include them in the other dir, dg-do link instead
> of run (so that it is not run multiple times), and check for the warnings;
> you could also use -foffload=hsa in there to make sure you only have to care
> about hsa warnings, and not NVPTX, or whatever other offloader.
> 

Just to be clear, I never wanted to be testing for presence of
warnings, I see no value in that.

All in all, I am willing to add -Wno-hsa to default options and only
have these warnings on in dedicated HSA directories.  I will amend the
posted patches once testsuite maintainers look at my initial proposal
for the first such directory.

Martin



Re: [hsa merge 08/10] HSAIL BRIG description header file

2016-02-26 Thread Martin Jambor
Hi,

I hope I've got some good news:

On Thu, Jan 14, 2016 at 05:18:56PM -0800, Ian Lance Taylor wrote:
> Jakub Jelinek  writes:
>
> > On Wed, Jan 13, 2016 at 06:39:33PM +0100, Martin Jambor wrote:
> >> the following patch adds a BRIG (binary representation of HSAIL)
> >> representation description.  It is within a single header file
> >> describing the binary structures and constants of the format.
> >>
> >> The file comes from the HSA Foundation (I have only added the
> >> HSA_BRIG_FORMAT_H macro and check and removed some weird comments
> >> which are not present in proposed future versions of the file) and is
> >> licensed under "University of Illinois/NCSA Open Source License."
> >>
> >> The license is "GPL-compatible" according to FSF
> >> (http://www.gnu.org/licenses/license-list.en.html#GPLCompatibleLicenses)
> >> so I believe we can have it in GCC.  Nevertheless, it is not GPL and
> >> there is no copyright assignment for it, but the situation is
> >> hopefully analogous to some other libraries that have their upstream
> >> elsewhere but we ship them as part of the GCC.
> >>
> >> In the previous posting of this patch
> >> (https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00721.html) I have
> >> requested a permission from the steering committee to include this file
> >> with a different upstream in GCC.  I have not received an official
> >> reply but since I have been chosen to be the HSA maintainer, I tend to
> >> think there were no legal objections against HSA going forward,
> >> including this file.
>
> Martin, could you ask the HSA Foundation or AMD or whoever if there is
> any way they could remove the second requirement of the license?  It
> adds yet another case where anybody distributing GCC has to list yet
> another copyright notice.
>

I have asked HSA foundation to do just that and apparently they agreed
to change the licensing of the file (in upcoming versions of HSA) to
the MIT license.  IIUC, the reading of the license header would then
be the one below.  I hope that means the problematic requirements will
be gone and we will be able to just use their file.  If, however, you
still think there will be issues preventing us from doing that, please
let me know as soon as possible.

Thanks,

Martin



The license is going to be:

The MIT License (MIT)

Copyright (c) 2016, HSA Foundation, Inc

 * All rights reserved.



Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
of the Software, and to permit persons to whom the Software is furnished to do
so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE."


Re: [hsa, testsuite] Suppress hsa warnings in libgomp tests

2016-03-01 Thread Martin Jambor
Hi,

On Fri, Feb 26, 2016 at 05:07:56PM +0100, Jakub Jelinek wrote:
> On Fri, Feb 26, 2016 at 04:59:57PM +0100, Martin Jambor wrote:
> > just like with the compiler gomp testsuite, we need to add -Wno-hsa to
> > options when compiling libgomp testcases in order not to have "excess
> > errors" failures when HSA is enabled.

...
> 
> I don't like this very much.
> Couldn't you instead add -Wno-hsa next to -fopenmp in *.exp, and just where
> you want to explicitly check the hsa warnings, enable it manually in
> dg-options or dg-additional-options (it would need to be guarded with hsa
> being enabled etc. anyway).
> 

as Jakub requested, this patch deals with HSA "excess errors" in the
libgomp library testsuite by passing -Wno-hsa to all of them.  IIUC,
that passing it in the second parameter of dg-runtest (as opposed to
the third) means that it will apply even tests that have their own
dg-options, which is presumably easier for everyone, provided that hsa
will get is own libgomp testsuite directories.

OK for trunk?

Thanks,

Martin

2016-02-29  Martin Jambor  

* testsuite/libgomp.c/c.exp: Pass -Wno-hsa to all tests.
* testsuite/libgomp.c++/c++.exp: Likewise.
* testsuite/libgomp.fortran/fortran.exp: Likewise.
---
 libgomp/testsuite/libgomp.c++/c++.exp | 2 +-
 libgomp/testsuite/libgomp.c/c.exp | 2 +-
 libgomp/testsuite/libgomp.fortran/fortran.exp | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/libgomp/testsuite/libgomp.c++/c++.exp 
b/libgomp/testsuite/libgomp.c++/c++.exp
index 0454f95..120e573 100644
--- a/libgomp/testsuite/libgomp.c++/c++.exp
+++ b/libgomp/testsuite/libgomp.c++/c++.exp
@@ -65,7 +65,7 @@ if { $lang_test_file_found } {
 }
 
 # Main loop.
-dg-runtest $tests "" "$libstdcxx_includes $DEFAULT_CFLAGS"
+dg-runtest $tests "-Wno-hsa" "$libstdcxx_includes $DEFAULT_CFLAGS"
 }
 
 # All done.
diff --git a/libgomp/testsuite/libgomp.c/c.exp 
b/libgomp/testsuite/libgomp.c/c.exp
index 300b921..d3cd144 100644
--- a/libgomp/testsuite/libgomp.c/c.exp
+++ b/libgomp/testsuite/libgomp.c/c.exp
@@ -31,7 +31,7 @@ append ld_library_path [gcc-set-multilib-library-path 
$GCC_UNDER_TEST]
 set_ld_library_path_env_vars
 
 # Main loop.
-dg-runtest $tests "" $DEFAULT_CFLAGS
+dg-runtest $tests "-Wno-hsa" $DEFAULT_CFLAGS
 
 # All done.
 dg-finish
diff --git a/libgomp/testsuite/libgomp.fortran/fortran.exp 
b/libgomp/testsuite/libgomp.fortran/fortran.exp
index 9e6b643..ea84d5c 100644
--- a/libgomp/testsuite/libgomp.fortran/fortran.exp
+++ b/libgomp/testsuite/libgomp.fortran/fortran.exp
@@ -66,7 +66,7 @@ if { $lang_test_file_found } {
 # For Fortran we're doing torture testing, as Fortran has far more tests
 # with arrays etc. that testing just -O0 or -O2 is insufficient, that is
 # typically not the case for C/C++.
-gfortran-dg-runtest $tests "" ""
+gfortran-dg-runtest $tests "-Wno-hsa" ""
 }
 
 # All done.
-- 
2.7.1




Re: [hsa, testsuite] Suppress hsa warnings in compiler gomp tests

2016-03-01 Thread Martin Jambor
Hi,

as Jakub requested in another thread, this patch deals with HSA
"excess errors" in the gomp compiler testsuite by passing -Wno-hsa to
all of them.  IIUC, that passing it in the second parameter of
*-dg-runtest (as opposed to the third) means that it will apply even
tests that have their own dg-options, which is presumably easier for
everyone, provided that hsa will get is own libgomp testsuite
directories.

OK for trunk?

Thanks,

Martin

2016-02-29  Martin Jambor  

* g++.dg/gomp/gomp.exp: Pass -Wno-hsa to all tests.
* gcc.dg/gomp/gomp.exp: Likewise.
* gfortran.dg/gomp/gomp.exp: Likewise.
---
 gcc/testsuite/g++.dg/gomp/gomp.exp  | 2 +-
 gcc/testsuite/gcc.dg/gomp/gomp.exp  | 2 +-
 gcc/testsuite/gfortran.dg/gomp/gomp.exp | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/g++.dg/gomp/gomp.exp 
b/gcc/testsuite/g++.dg/gomp/gomp.exp
index 7365389..bee5441 100644
--- a/gcc/testsuite/g++.dg/gomp/gomp.exp
+++ b/gcc/testsuite/g++.dg/gomp/gomp.exp
@@ -29,7 +29,7 @@ dg-init
 # Main loop.
 g++-dg-runtest [lsort [concat \
[find $srcdir/$subdir *.C] \
-   [find $srcdir/c-c++-common/gomp *.c]]] "" "-fopenmp"
+   [find $srcdir/c-c++-common/gomp *.c]]] "-Wno-hsa" "-fopenmp"
 
 # All done.
 dg-finish
diff --git a/gcc/testsuite/gcc.dg/gomp/gomp.exp 
b/gcc/testsuite/gcc.dg/gomp/gomp.exp
index 78623fc..d0889c5 100644
--- a/gcc/testsuite/gcc.dg/gomp/gomp.exp
+++ b/gcc/testsuite/gcc.dg/gomp/gomp.exp
@@ -31,7 +31,7 @@ dg-init
 # Main loop.
 dg-runtest [lsort [concat \
[find $srcdir/$subdir *.c] \
-   [find $srcdir/c-c++-common/gomp *.c]]] "" "-fopenmp"
+   [find $srcdir/c-c++-common/gomp *.c]]] "-Wno-hsa" "-fopenmp"
 
 # All done.
 dg-finish
diff --git a/gcc/testsuite/gfortran.dg/gomp/gomp.exp 
b/gcc/testsuite/gfortran.dg/gomp/gomp.exp
index 625361b..78d70b5 100644
--- a/gcc/testsuite/gfortran.dg/gomp/gomp.exp
+++ b/gcc/testsuite/gfortran.dg/gomp/gomp.exp
@@ -30,7 +30,7 @@ dg-init
 
 # Main loop.
 gfortran-dg-runtest [lsort \
-   [find $srcdir/$subdir *.\[fF\]{,90,95,03,08} ] ] "" "-fopenmp"
+   [find $srcdir/$subdir *.\[fF\]{,90,95,03,08} ] ] "-Wno-hsa" "-fopenmp"
 
 # All done.
 dg-finish
-- 
2.7.1



Re: [hsa, testsuite] Suppress hsa warnings in libgomp tests

2016-03-01 Thread Martin Jambor
Hi

On Tue, Mar 01, 2016 at 07:47:49PM +0100, Jakub Jelinek wrote:
> On Tue, Mar 01, 2016 at 07:39:18PM +0100, Martin Jambor wrote:
> > as Jakub requested, this patch deals with HSA "excess errors" in the
> > libgomp library testsuite by passing -Wno-hsa to all of them.  IIUC,
> > that passing it in the second parameter of dg-runtest (as opposed to
> > the third) means that it will apply even tests that have their own
> > dg-options, which is presumably easier for everyone, provided that hsa
> > will get is own libgomp testsuite directories.
> 
> What is the difference betwee the $flags and $default-extra-cflags
> arguments to dg-runtest?

well, exactly what I wrote in the original email and what you have
quoted (and me as well) above.  But let me quote the dejagnu source
comment of dg-runtest, which is perhaps more clear:

  # FLAGS is a set of options to always pass.
  # DEFAULT_EXTRA_FLAGS is a set of options to pass if the testcase
  # doesn't
  # specify any (with dg-option).

So if I changed DEFAULT_EXTRA_FLAGS rather than FLAGS, I'd have to go
through all testcases specifying dg-options and add -Wno-hsa there
too.  Moreover, we'd have to add -Wno-hsa to all appropriate future
testcases if they specify their own dg-options.

Perhaps we should be using dg-additional-options in libgomp testsuite
wherever possible but there certainly are testcases using dg-options.

> You seem to stick -Wno-hsa into the former,
> which to me looks like it will be mentioned as part of the test
> names (e.g. when cycling through -O* options, -Wno-hsa would be printed
> along with -O2 etc.)?

Yes, that is an unfortunate side-effect. Furthermore, automated
comparison scripts might be confused by the change (mine was,
reporting all testcases as newly passed/xfailed and old as
disappeared).

But again, I do not have a strong preference, I can change the patches
to use DEFAULT_EXTRA_FLAGS and am willing to be watching for fallout
and fixing dg-options if you prefer that.  So let me know what you
consider nicer and I'll do it.

Thanks,

Martin


Re: [hsa, testsuite] Suppress hsa warnings in libgomp tests

2016-03-04 Thread Martin Jambor
Hi,

On Tue, Mar 01, 2016 at 11:06:43PM +0100, Jakub Jelinek wrote:
> On Tue, Mar 01, 2016 at 10:47:46PM +0100, Martin Jambor wrote:
> > well, exactly what I wrote in the original email and what you have
> > quoted (and me as well) above.  But let me quote the dejagnu source
> > comment of dg-runtest, which is perhaps more clear:
> > 
> >   # FLAGS is a set of options to always pass.
> >   # DEFAULT_EXTRA_FLAGS is a set of options to pass if the testcase
> >   # doesn't
> >   # specify any (with dg-option).
> > 
> > So if I changed DEFAULT_EXTRA_FLAGS rather than FLAGS, I'd have to go
> > through all testcases specifying dg-options and add -Wno-hsa there
> > too.  Moreover, we'd have to add -Wno-hsa to all appropriate future
> > testcases if they specify their own dg-options.
> 
> Ah, ok; what about adding
> # Disable HSA warnings by default.
> lappend ALWAYS_CFLAGS "additional_flags=-Wno-hsa"
> in libgomp/testsuite/lib/libgomp.exp (next to e.g.
> -fno-diagnostics-show-caret)?
> 

That works nicely (though I have to override it explicitely in the
libgomp.hsa.c directory with another -Whsa, but I guess we can live
with that).  So I will use the above for the libgomp case.

I have tried to come up with a similar alternative for
gcc.dg/gomp/gomp.exp, g++.dg/gomp/gomp.exp and gfortran/gomp/gomp.exp
but so far I have not achieved to make the C++ and Fortran cases work
in any other way but pass -Wno-hsa in FLAGS (and thus change the
name).  For C, adding the following before the main loop works, even
though it looks too much like a hack to me:

global TEST_ALWAYS_FLAGS
set TEST_ALWAYS_FLAGS [concat $TEST_ALWAYS_FLAGS "-Wno-hsa"]

However, the C++ and Fortran cases use gfortran-dg-runtest to cycle
through a set of torture options and I have not yet discovered the
right magic variable to set (for example, adding -Wno-hsa to
DG_TORTURE_OPTIONS elements does not work).

I'm afraid I have spent way too much time on this already, so unless
someone has any ideas, I'd suggest that we use the (already approved)
name-changing gomp patch as it is.  Or at least for C++ and Fortran.

Thanks,

Martin


Re: [hsa, testsuite] Suppress hsa warnings in libgomp tests

2016-03-04 Thread Martin Jambor
Hi,

On Fri, Mar 04, 2016 at 04:31:29PM +0100, Jakub Jelinek wrote:
> On Fri, Mar 04, 2016 at 04:27:11PM +0100, Martin Jambor wrote:
> > > Ah, ok; what about adding
> > > # Disable HSA warnings by default.
> > > lappend ALWAYS_CFLAGS "additional_flags=-Wno-hsa"
> > > in libgomp/testsuite/lib/libgomp.exp (next to e.g.
> > > -fno-diagnostics-show-caret)?
> > > 
> > 
> > That works nicely (though I have to override it explicitely in the
> > libgomp.hsa.c directory with another -Whsa, but I guess we can live
> > with that).  So I will use the above for the libgomp case.
> 
> Ok.
> 
> > I have tried to come up with a similar alternative for
> > gcc.dg/gomp/gomp.exp, g++.dg/gomp/gomp.exp and gfortran/gomp/gomp.exp
> > but so far I have not achieved to make the C++ and Fortran cases work
> > in any other way but pass -Wno-hsa in FLAGS (and thus change the
> > name).  For C, adding the following before the main loop works, even
> > though it looks too much like a hack to me:
> > 
> > global TEST_ALWAYS_FLAGS
> > set TEST_ALWAYS_FLAGS [concat $TEST_ALWAYS_FLAGS "-Wno-hsa"]
> 
> Doesn't this also cause the -Wno-hsa option on all further tests executed by
> other *.exp after gomp.exp by the same runtest invocation?
> 

Not in the limited runs that I experimented with so far, but I
certainly kept this possibility in mind too.  If so, I would either
set it back before invoking dg-finish or dismiss the whole idea.

> > However, the C++ and Fortran cases use gfortran-dg-runtest to cycle
> > through a set of torture options and I have not yet discovered the
> > right magic variable to set (for example, adding -Wno-hsa to
> > DG_TORTURE_OPTIONS elements does not work).
> > 
> > I'm afraid I have spent way too much time on this already, so unless
> > someone has any ideas, I'd suggest that we use the (already approved)
> > name-changing gomp patch as it is.  Or at least for C++ and Fortran.
> 
> Do you have URL for what you refer to?
> 

Sure, the patch has been posted here:

https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00071.html

and approved here:

https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00074.html

Martin


Re: [hsa, testsuite] Suppress hsa warnings in libgomp tests

2016-03-04 Thread Martin Jambor
On Fri, Mar 04, 2016 at 05:04:31PM +0100, Jakub Jelinek wrote:
> On Fri, Mar 04, 2016 at 05:01:34PM +0100, Martin Jambor wrote:
> > Not in the limited runs that I experimented with so far, but I
> > certainly kept this possibility in mind too.  If so, I would either
> > set it back before invoking dg-finish or dismiss the whole idea.
> > 
> > > > However, the C++ and Fortran cases use gfortran-dg-runtest to cycle
> > > > through a set of torture options and I have not yet discovered the
> > > > right magic variable to set (for example, adding -Wno-hsa to
> > > > DG_TORTURE_OPTIONS elements does not work).
> > > > 
> > > > I'm afraid I have spent way too much time on this already, so unless
> > > > someone has any ideas, I'd suggest that we use the (already approved)
> > > > name-changing gomp patch as it is.  Or at least for C++ and Fortran.
> > > 
> > > Do you have URL for what you refer to?
> > > 
> > 
> > Sure, the patch has been posted here:
> > 
> > https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00071.html
> 
> For the g*.dg/gomp/, if you'd only move -Wno-hsa into the last argument
> next to -fopenmp, how many tests would be affected?

Out of 287 files that have dg-options with them in the gomp
directories, only 9 generate hsa warnings:

  c-c++-common/gomp/clauses-1.c:/* { dg-options "-fopenmp" } */
  c-c++-common/gomp/if-1.c:/* { dg-options "-fopenmp" } */
  c-c++-common/gomp/pr61486-2.c:/* { dg-options "-fopenmp" } */
  c-c++-common/gomp/target-teams-1.c:/* { dg-options "-fopenmp 
-fdump-tree-gimple" } */
  g++.dg/gomp/target-teams-1.C:// { dg-options "-fopenmp -fdump-tree-gimple" }
  gcc.dg/gomp/pr68128-2.c:/* { dg-options "-O2 -fopenmp -fdump-tree-omplower" } 
*/
  gfortran.dg/gomp/target1.f90:! { dg-options "-fopenmp" }
  gfortran.dg/gomp/target2.f90:! { dg-options "-fopenmp -ffree-line-length-160" 
}
  gfortran.dg/gomp/target3.f90:! { dg-options "-fopenmp" }

> If not really many,
> perhaps those could be changed to use dg-additional-options instead of
> dg-options.

I do not know what -ffree-line-length-160 is, but probably all of
them, even though putting -O2 in gcc.dg/gomp/pr68128-2.c to
"additional" flags feels just wrong.

However, the real question is: Would such a solution really be much
better than the first version of the patch
(https://gcc.gnu.org/ml/gcc-patches/2016-02/msg01813.html)?  After
all, in comparison it would only avoid touching two tests and it will
not avoid issues with tests added in future if they use dg-options.

Martin


Backport fix of PR 69666 and PR 69920 to gcc-5 branch

2016-03-04 Thread Martin Jambor
Hi,

a week has passed with PR 69920 fix in and it seems to have fixed all
issues caused by the fix to PR 69666, which I have reverted on the
gcc-5 branch.

So I am going to un-do that revert and backport the PR 69920 fix in
one commit to the branch, after final bootstrap and testing runs
finish (actually, it has passed successfully on x86_64-linux, there is
one on i686 that is still running).

Thanks,

Martin


2016-03-03  Martin Jambor  

PR tree-optimization/69666
PR middle-end/69920
* tree-sra.c (sra_modify_assign): Do not attempt to create
default_def replacements for unscalarizable regions.  Do not
remove loads of uninitialized aggregates to SSA_NAMEs.

testsuite/
* gcc.dg/torture/pr69932.c: New test.
* gcc.dg/torture/pr69936.c: Likewise.

diff --git a/gcc/testsuite/gcc.dg/torture/pr69932.c 
b/gcc/testsuite/gcc.dg/torture/pr69932.c
new file mode 100644
index 000..4b82130
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr69932.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+
+int a;
+void fn1() {
+  int b = 4;
+  short c[4];
+  c[b] = c[a];
+  if (c[2]) {}
+
+}
diff --git a/gcc/testsuite/gcc.dg/torture/pr69936.c 
b/gcc/testsuite/gcc.dg/torture/pr69936.c
new file mode 100644
index 000..3023bbb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr69936.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+
+int a;
+char b;
+void fn1(int p1) {}
+
+int fn2() { return 5; }
+
+void fn3() {
+  if (fn2())
+;
+  else {
+char c[5];
+c[0] = 5;
+  lbl_608:
+fn1(c[9]);
+int d = c[9];
+c[2] | a;
+d = c[b];
+  }
+  goto lbl_608;
+}
+
+int main() { return 0; }
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 145a07c..3457aac 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -3242,6 +3242,7 @@ sra_modify_assign (gimple stmt, gimple_stmt_iterator *gsi)
 }
   else if (racc
   && !racc->grp_unscalarized_data
+  && !racc->grp_unscalarizable_region
   && TREE_CODE (lhs) == SSA_NAME
   && !access_has_replacements_p (racc))
 {
@@ -3405,7 +3406,8 @@ sra_modify_assign (gimple stmt, gimple_stmt_iterator *gsi)
   else
{
  if (access_has_children_p (racc)
- && !racc->grp_unscalarized_data)
+ && !racc->grp_unscalarized_data
+ && TREE_CODE (lhs) != SSA_NAME)
{
  if (dump_file)
{


[hsa] Consodlidate GTY roots for trees used during expansion to HSA

2016-03-07 Thread Martin Jambor
Hi,

when testing the experimental hsa branch, where dynamic parallelism is
not disabled and get_hsa_kernel_dispatch_offset is executed quite a
bit more frequently, I have come across hsa_kernel_dispatch_type being
freed by gcc even though it is marked with a GTY flag.  The reason is
that the file hsa-gen.c is not listed in GTFILES in Makefile.in (and
this it does not have and does not include its gcc header file).  This
was the only intended GTY root in this file but there is another one
in hsa-brig.c for lists of statements to put into a static
constructors and destructors.

Rather than adding these files to GTFILES, I have decided to move the
tree roots to hsa.c which is already there and GTY roots are really
only needed for these few rather special occasions.  The patch below,
which does just that, passed bootstrap and testing and HSA testing on
both trunk and the branch.  I will commit it in a few moments.

Thanks,

Martin


2016-03-02  Martin Jambor  

* hsa.h (hsa_get_ctor_statements): Declare.
(hsa_get_dtor_statements): Likewise.
(hsa_get_kernel_dispatch_type): Likewise.
* hsa.c (hsa_get_ctor_statements): New function.
(hsa_get_dtor_statements): Likewise.
(hsa_get_kernel_dispatch_type): Likewise.
* hsa-brig.c (hsa_cdtor_statements): Removed.
(hsa_output_libgomp_mapping): Use hsa_get_ctor_statements and
hsa_get_dtor_statements.
* hsa-gen.c (hsa_kernel_dispatch_type): Removed.
(get_hsa_kernel_dispatch_offset): Use hsa_get_kernel_dispatch_type.
---
 gcc/hsa-brig.c | 14 ++
 gcc/hsa-gen.c  | 13 ++---
 gcc/hsa.c  | 25 +
 gcc/hsa.h  |  3 +++
 4 files changed, 40 insertions(+), 15 deletions(-)

diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c
index 61cfd8b..2a301be 100644
--- a/gcc/hsa-brig.c
+++ b/gcc/hsa-brig.c
@@ -2006,8 +2006,6 @@ hsa_brig_emit_omp_symbols (void)
   emit_directive_variable (hsa_num_threads);
 }
 
-static GTY(()) tree hsa_cdtor_statements[2];
-
 /* Create and return __hsa_global_variables symbol that contains
all informations consumed by libgomp to link global variables
with their string names used by an HSA kernel.  */
@@ -2408,6 +2406,7 @@ hsa_output_libgomp_mapping (tree brig_decl)
 = builtin_decl_explicit (BUILT_IN_GOMP_OFFLOAD_REGISTER);
   gcc_checking_assert (offload_register);
 
+  tree *hsa_ctor_stmts = hsa_get_ctor_statements ();
   append_to_statement_list
 (build_call_expr (offload_register, 4,
  build_int_cstu (unsigned_type_node,
@@ -2416,15 +2415,15 @@ hsa_output_libgomp_mapping (tree brig_decl)
  build_fold_addr_expr (hsa_libgomp_host_table),
  build_int_cst (integer_type_node, GOMP_DEVICE_HSA),
  build_fold_addr_expr (hsa_img_descriptor)),
- &hsa_cdtor_statements[0]);
+ hsa_ctor_stmts);
 
-  cgraph_build_static_cdtor ('I', hsa_cdtor_statements[0],
-DEFAULT_INIT_PRIORITY);
+  cgraph_build_static_cdtor ('I', *hsa_ctor_stmts, DEFAULT_INIT_PRIORITY);
 
   tree offload_unregister
 = builtin_decl_explicit (BUILT_IN_GOMP_OFFLOAD_UNREGISTER);
   gcc_checking_assert (offload_unregister);
 
+  tree *hsa_dtor_stmts = hsa_get_dtor_statements ();
   append_to_statement_list
 (build_call_expr (offload_unregister, 4,
  build_int_cstu (unsigned_type_node,
@@ -2433,9 +2432,8 @@ hsa_output_libgomp_mapping (tree brig_decl)
  build_fold_addr_expr (hsa_libgomp_host_table),
  build_int_cst (integer_type_node, GOMP_DEVICE_HSA),
  build_fold_addr_expr (hsa_img_descriptor)),
- &hsa_cdtor_statements[1]);
-  cgraph_build_static_cdtor ('D', hsa_cdtor_statements[1],
-DEFAULT_INIT_PRIORITY);
+ hsa_dtor_stmts);
+  cgraph_build_static_cdtor ('D', *hsa_dtor_stmts, DEFAULT_INIT_PRIORITY);
 }
 
 /* Emit the brig module we have compiled to a section in the final assembly and
diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index d7d39f0..fc59fa5 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -3772,20 +3772,19 @@ gen_set_num_threads (tree value, hsa_bb *hbb)
   hbb->append_insn (basic);
 }
 
-static GTY (()) tree hsa_kernel_dispatch_type = NULL;
-
 /* Return byte offset of a FIELD_NAME in GOMP_hsa_kernel_dispatch which
is defined in plugin-hsa.c.  */
 
 static HOST_WIDE_INT
 get_hsa_kernel_dispatch_offset (const char *field_name)
 {
-  if (hsa_kernel_dispatch_type == NULL)
+  tree *hsa_kernel_dispatch_type = hsa_get_kernel_dispatch_type ();
+  if (*hsa_kernel_dispatch_type == NULL)
 {
   /* Collection of information needed for a dispatch of a kernel from a
 kernel.  Keep in sync with libgomp's plugin-hsa.c.  */
 
-  hsa_kernel_dispatch_type = make_node (RECORD_TYPE);
+  *hsa_kernel_dispatch_type = make_node (RECORD_TYPE);
 

[hsa testsuite 0/5] Re-post of all pending patches adjusting testsuite for HSA

2016-03-07 Thread Martin Jambor
Hi,

in order to consolidate things, I have decided to re-post all "hsa
testsuite" patches under this thread.  With the patches applied, we do
no not get any spurious failures because of hsa warnings or libgomp
testcases failing because they are run on the host fallback.
Moreover, the first patch adds a simple dump-scan compile-time
gridification tests and the last patch adds a special directory for
run-time C tests of hsa which are run only when HSA devices are
actually selected for offloading.  In the future, I'll likely propose
similar C++ and Fortran directories.

All patches were tested by running the whole testsuite on patched
trunk:

  - that was configured for all languages except go but not configured
for HSA,

  - that was configured for all languages except go and also for HSA
offloading, but an HSA device was not present on the machine, and

  - running the whole suite after configuring trunk for C, C++ and
Fortran on a computer with an HSA APU,

and subsequently comparing generated .sum files with unpatched trunk.

Thanks for any feedback (and approvals ;-),

Martin


[hsa testsuite 3/5] Suppress hsa warnings in libgomp tests

2016-03-07 Thread Martin Jambor
Hi,

just like with the compiler gomp testsuite, we need to add -Wno-hsa to
options when compiling libgomp testcases in order not to have "excess
errors" failures when HSA is enabled.  There are quite many of such
testcases on the trunk because I have disabled the dynamic parallelism
way of executing stuff.

The patch below adds the option to all libgomp testsuite compilations,
so that people who are not interested in HSA do not need to care.  The
patch has been tested both with and without HSA enabled.  OK for
trunk?

Thanks,

Martin


2016-03-04  Martin Jambor  

* testsuite/lib/libgomp.exp (libgomp_init): Append -Wno-hsa to
ALWAYS_CFLAGS.
---
 libgomp/testsuite/lib/libgomp.exp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/libgomp/testsuite/lib/libgomp.exp 
b/libgomp/testsuite/lib/libgomp.exp
index 154a447..bbc2c26 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -237,6 +237,9 @@ proc libgomp_init { args } {
 # Disable caret
 lappend ALWAYS_CFLAGS "additional_flags=-fno-diagnostics-show-caret"
 
+# Disable HSA warnings by default.
+lappend ALWAYS_CFLAGS "additional_flags=-Wno-hsa"
+
 # Disable color diagnostics
 lappend ALWAYS_CFLAGS "additional_flags=-fdiagnostics-color=never"
 
-- 
2.7.1



[hsa testsuite 1/5] Gridification tests

2016-03-07 Thread Martin Jambor
Hi,

the patch below adds a DejaGNU effective target predicate (is that the
correct dejagnu term?) offload_hsa so that selected tests can be run
only if the hsa offloading is enabled.  I hope it is fairly standard
stuff.  Additionally, it adds one C/C++ and one Fortran testsuite to
check that gridification happens.

Tested, both with and without HSA enabled.  OK for trunk?

Thanks,

Martin

2016-02-10  Martin Jambor  

* target-supports.exp (check_effective_target_offload_hsa): New.
* c-c++-common/gomp/gridify-1.c: New test.
* gfortran.dg/gomp/gridify-1.f90: Likewise.
---
 gcc/testsuite/c-c++-common/gomp/gridify-1.c  | 54 
 gcc/testsuite/gfortran.dg/gomp/gridify-1.f90 | 16 +
 gcc/testsuite/lib/target-supports.exp|  8 +
 3 files changed, 78 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/gridify-1.c
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/gridify-1.f90

diff --git a/gcc/testsuite/c-c++-common/gomp/gridify-1.c 
b/gcc/testsuite/c-c++-common/gomp/gridify-1.c
new file mode 100644
index 000..ba7a866
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/gridify-1.c
@@ -0,0 +1,54 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target offload_hsa } */
+/* { dg-options "-fopenmp -fdump-tree-omplower-details" } */
+
+void
+foo1 (int n, int *a, int workgroup_size)
+{
+  int i;
+#pragma omp target
+#pragma omp teams thread_limit(workgroup_size)
+#pragma omp distribute parallel for shared(a) firstprivate(n) private(i)
+for (i = 0; i < n; i++)
+  a[i]++;
+}
+
+void
+foo2 (int j, int n, int *a)
+{
+  int i;
+#pragma omp target teams
+#pragma omp distribute parallel for shared(a) firstprivate(n) private(i) 
firstprivate(j)
+for (i = j + 1; i < n; i++)
+  a[i] = i;
+}
+
+void
+foo3 (int j, int n, int *a)
+{
+  int i;
+#pragma omp target teams
+#pragma omp distribute parallel for shared(a) firstprivate(n) private(i) 
firstprivate(j)
+  for (i = j + 1; i < n; i += 3)
+a[i] = i;
+}
+
+void
+foo4 (int j, int n, int *a)
+{
+#pragma omp parallel
+  {
+#pragma omp single
+{
+  int i;
+#pragma omp target
+#pragma omp teams
+#pragma omp distribute parallel for shared(a) firstprivate(n) private(i) 
firstprivate(j)
+  for (i = j + 1; i < n; i += 3)
+   a[i] = i;
+}
+  }
+}
+
+
+/* { dg-final { scan-tree-dump-times "Target construct will be turned into a 
gridified GPGPU kernel" 4 "omplower" } } */
diff --git a/gcc/testsuite/gfortran.dg/gomp/gridify-1.f90 
b/gcc/testsuite/gfortran.dg/gomp/gridify-1.f90
new file mode 100644
index 000..00ff7f5
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/gridify-1.f90
@@ -0,0 +1,16 @@
+! { dg-do compile }
+! { dg-require-effective-target offload_hsa }
+! { dg-options "-fopenmp -fdump-tree-omplower-details" } */
+
+subroutine vector_square(n, a, b)
+  integer i, n, b(n), a(n)
+!$omp target teams
+!$omp distribute parallel do
+  do i=1,n
+  b(i) = a(i) * a(i)
+  enddo
+!$omp end distribute parallel do
+!$omp end target teams
+end subroutine vector_square
+
+! { dg-final { scan-tree-dump "Target construct will be turned into a 
gridified GPGPU kernel" "omplower" } }
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 0b4252f..fac4c3c 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -6936,3 +6936,11 @@ proc check_effective_target_offload_nvptx { } {
int main () {return 0;}
 } "-foffload=nvptx-none" ]
 }
+
+# Return 1 if the compiler has been configured with hsa offloading.
+
+proc check_effective_target_offload_hsa { } {
+return [check_no_compiler_messages offload_hsa assembly {
+   int main () {return 0;}
+} "-foffload=hsa" ]
+}
-- 
2.7.1



[hsa testsuite 5/5] New directory for HSA-specific C testcases

2016-03-07 Thread Martin Jambor
Hi,

we would like a place to have some HSA-specific tests, which would
only run not only when HSA is enabled at configuration time but also
when HSA hardware is present and used for offloading.

I have proposed the first version of this patch as
https://gcc.gnu.org/ml/gcc-patches/2016-02/msg01817.html and got some
seedback from Mike Stump in
https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00086.html.  I hope I
have incorporated his suggestions.  As I wrote in the cover letter, it
is likely I'll propose similar C++ and Fortran directories in the
future.

Is the patch OK for trunk?

Thanks,

Martin


2016-03-03  Martin Jambor  

* testsuite/lib/libgomp.exp
(check_effective_target_hsa_offloading_selected_nocache): New.
(check_effective_target_hsa_offloading_selected): Likewise.
* testsuite/libgomp.hsa.c/c.exp: Likewise.
* testsuite/libgomp.hsa.c/alloca-1.c: Likewise.
* testsuite/libgomp.hsa.c/bitfield-1.c: Likewise.
* testsuite/libgomp.hsa.c/builtins-1.c: Likewise.
* testsuite/libgomp.hsa.c/complex-1.c: Likewise.
* testsuite/libgomp.hsa.c/formal-actual-args-1.c: Likewise.
* testsuite/libgomp.hsa.c/function-call-1.c: Likewise.
* testsuite/libgomp.hsa.c/get-level-1.c: Likewise.
* testsuite/libgomp.hsa.c/gridify-1.c: Likewise.
* testsuite/libgomp.hsa.c/gridify-2.c: Likewise.
* testsuite/libgomp.hsa.c/gridify-3.c: Likewise.
* testsuite/libgomp.hsa.c/gridify-4.c: Likewise.
* testsuite/libgomp.hsa.c/memory-operations-1.c: Likewise.
* testsuite/libgomp.hsa.c/pr69568.c: Likewise.
* testsuite/libgomp.hsa.c/rotate-1.c: Likewise.
* testsuite/libgomp.hsa.c/switch-1.c: Likewise.
* testsuite/libgomp.hsa.c/switch-branch-1.c: Likewise.
---
 libgomp/testsuite/lib/libgomp.exp  |  53 +++
 libgomp/testsuite/libgomp.hsa.c/alloca-1.c |  25 
 libgomp/testsuite/libgomp.hsa.c/bitfield-1.c   | 160 +
 libgomp/testsuite/libgomp.hsa.c/builtins-1.c   |  97 +
 libgomp/testsuite/libgomp.hsa.c/c.exp  |  42 ++
 libgomp/testsuite/libgomp.hsa.c/complex-1.c|  65 +
 .../testsuite/libgomp.hsa.c/formal-actual-args-1.c |  83 +++
 libgomp/testsuite/libgomp.hsa.c/function-call-1.c  |  50 +++
 libgomp/testsuite/libgomp.hsa.c/get-level-1.c  |  26 
 libgomp/testsuite/libgomp.hsa.c/gridify-1.c|  26 
 libgomp/testsuite/libgomp.hsa.c/gridify-2.c|  26 
 libgomp/testsuite/libgomp.hsa.c/gridify-3.c|  39 +
 libgomp/testsuite/libgomp.hsa.c/gridify-4.c|  45 ++
 .../testsuite/libgomp.hsa.c/memory-operations-1.c  |  92 
 libgomp/testsuite/libgomp.hsa.c/pr69568.c  |  41 ++
 libgomp/testsuite/libgomp.hsa.c/rotate-1.c |  39 +
 libgomp/testsuite/libgomp.hsa.c/switch-1.c | 145 +++
 libgomp/testsuite/libgomp.hsa.c/switch-branch-1.c  | 116 +++
 18 files changed, 1170 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/alloca-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/bitfield-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/builtins-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/c.exp
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/complex-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/formal-actual-args-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/function-call-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/get-level-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/gridify-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/gridify-2.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/gridify-3.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/gridify-4.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/memory-operations-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/pr69568.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/rotate-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/switch-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/switch-branch-1.c

diff --git a/libgomp/testsuite/lib/libgomp.exp 
b/libgomp/testsuite/lib/libgomp.exp
index bbc2c26..0d5b6d4 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -395,3 +395,56 @@ proc check_effective_target_openacc_host_selected { } {
 }
 return 0;
 }
+
+# Return 1 if the selected OMP device is actually a HSA device
+
+proc check_effective_target_hsa_offloading_selected_nocache {} {
+global tool
+
+set src {
+   int main () {
+   int v = 1;
+   #pragma omp target map(from:v)
+   v = 0;
+   return v;
+   }
+}
+
+set result [eval [list check_compile hsa_offloading_src executable $src] 
""]
+set lines [lindex $result 0]
+set output [lindex $result 1]
+
+set ok 0
+if { [string 

[hsa testsuite 4/5] Adjust libgomp tests that do not work on host fallback

2016-03-07 Thread Martin Jambor
Hi,

this patch avoids run-time failures in libgomp testsuite that
curtrently happen when HSA offloading is actually used.  All of these
tests require the offload_device effective target which the patch
changes to offload_device_nonshared_as one.

For some tests, such as libgomp.c/examples-4/device-1.c this is
clearly just the correct thing to do because the test explicitely
checks that changes that happen in a target construct and are not
"mapped" back are not observable on the host.

However, the majority of the tests has a different problem.  If a test
for some reason is not compiled into HSAIL (usually because it would
require the dynamic parallelism path which is disabled or because it
calls abort from within target which HSA so far cannot handle), the
host fallback is called, even though the test actually is not supposed
to be called on it.  Such problematic tests then call
omp_is_initial_device to verify they are not running on the host and
decide to fail when they figure out they are.

Changing the effective target only to devices with non-shared memory
probably isn't the really correct fix.  We basically want to disable
the host fallback for them regardeless of address spaces but I cannot
think of a simple and generic way of doing that.  However, all
testcases for non-shared memory devices were written with disallowed
fallback in mind and so this soulution also gives the desired result.
Perhaps we need something better for the long term, any suggestions
are welcome.

Tested both with and without HSA (enabled or present).  OK for trunk?

Thanks,

Martin

2016-02-12  Martin Jambor  

libgomp/
* testsuite/libgomp.c/examples-4/async_target-2.c: Only run on
non-shared memory accelerators.
* testsuite/libgomp.c/examples-4/device-1.c: Likewise.
* testsuite/libgomp.c/examples-4/target-5.c: Likewise.
* testsuite/libgomp.c/examples-4/target_data-6.c: Likewise.
* testsuite/libgomp.c/examples-4/target_data-7.c: Likewise.
* testsuite/libgomp.fortran/examples-4/async_target-2.f90: Likewise.
* testsuite/libgomp.fortran/examples-4/device-1.f90: Likewise.
* testsuite/libgomp.fortran/examples-4/target-5.f90: Likewise.
* testsuite/libgomp.fortran/examples-4/target_data-6.f90: Likewise.
* testsuite/libgomp.fortran/examples-4/target_data-7.f90: Likewise.
---
 libgomp/testsuite/libgomp.c/examples-4/async_target-2.c | 2 +-
 libgomp/testsuite/libgomp.c/examples-4/device-1.c   | 2 +-
 libgomp/testsuite/libgomp.c/examples-4/target-5.c   | 2 +-
 libgomp/testsuite/libgomp.c/examples-4/target_data-6.c  | 2 +-
 libgomp/testsuite/libgomp.c/examples-4/target_data-7.c  | 2 +-
 libgomp/testsuite/libgomp.fortran/examples-4/async_target-2.f90 | 2 +-
 libgomp/testsuite/libgomp.fortran/examples-4/device-1.f90   | 2 +-
 libgomp/testsuite/libgomp.fortran/examples-4/target-5.f90   | 2 +-
 libgomp/testsuite/libgomp.fortran/examples-4/target_data-6.f90  | 2 +-
 libgomp/testsuite/libgomp.fortran/examples-4/target_data-7.f90  | 2 +-
 10 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/libgomp/testsuite/libgomp.c/examples-4/async_target-2.c 
b/libgomp/testsuite/libgomp.c/examples-4/async_target-2.c
index ce63328..0c76f8e 100644
--- a/libgomp/testsuite/libgomp.c/examples-4/async_target-2.c
+++ b/libgomp/testsuite/libgomp.c/examples-4/async_target-2.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-require-effective-target offload_device } */
+/* { dg-require-effective-target offload_device_nonshared_as } */
 
 #include 
 #include 
diff --git a/libgomp/testsuite/libgomp.c/examples-4/device-1.c 
b/libgomp/testsuite/libgomp.c/examples-4/device-1.c
index dad8572..46aa160 100644
--- a/libgomp/testsuite/libgomp.c/examples-4/device-1.c
+++ b/libgomp/testsuite/libgomp.c/examples-4/device-1.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-require-effective-target offload_device } */
+/* { dg-require-effective-target offload_device_nonshared_as } */
 
 #include 
 #include 
diff --git a/libgomp/testsuite/libgomp.c/examples-4/target-5.c 
b/libgomp/testsuite/libgomp.c/examples-4/target-5.c
index 1853fba..1c14bae 100644
--- a/libgomp/testsuite/libgomp.c/examples-4/target-5.c
+++ b/libgomp/testsuite/libgomp.c/examples-4/target-5.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-require-effective-target offload_device } */
+/* { dg-require-effective-target offload_device_nonshared_as } */
 
 #include 
 #include 
diff --git a/libgomp/testsuite/libgomp.c/examples-4/target_data-6.c 
b/libgomp/testsuite/libgomp.c/examples-4/target_data-6.c
index affeb49..57c7c0c 100644
--- a/libgomp/testsuite/libgomp.c/examples-4/target_data-6.c
+++ b/libgomp/testsuite/libgomp.c/examples-4/target_data-6.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-require-effective-target offload_device } */
+/* { dg-require-effective-target offload_device_nonshared_as } */
 
 #include 
 #include 
diff --git a/libgomp/test

[hsa testsuite 2/5] Suppress hsa warnings in compiler gomp tests

2016-03-07 Thread Martin Jambor
Hi,

as Jakub requested, this patch deals with HSA "excess errors" in the
gomp compiler testsuite by passing -Wno-hsa to all of them.  After
discussing this in the thread about similar libgomp tests[1] (which
are however handled differently), Jakub expressed preference for
passing the option in default_extra_flags rather than flags so that
names of the tests do not change.

This however requires that the failing tests which use dg-options must
be adjusted.  There is 9 of them, most of them have just superfluous
-fopenmp in them which can be removed because that is the default and
the rest is handled by turning dg-options into dg-additional-options.

OK for trunk?

Thanks,

Martin

[1] https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00381.html


2016-03-04  Martin Jambor  

* c-c++-common/gomp/clauses-1.c: Remove dg-options.
* c-c++-common/gomp/if-1.c: Likewise.
* c-c++-common/gomp/pr61486-2.c: Likewise.
* c-c++-common/gomp/target-teams-1.c: Moved dg-options except -fopenmp
to dg-additional-options.
* g++.dg/gomp/gomp.exp: Pass -Wno-hsa to all tests.
* g++/gomp/target-teams-1.c: Likewise.
* gcc.dg/gomp/gomp.exp: Likewise.
* gcc.dg/gomp/pr68128-2.c: Moved dg-options except -fopenmp to
dg-additional-options.
* gfortran.dg/gomp/gomp.exp: Likewise.
* gfortran.dg/gomp/target1.f90: Remove dg-options.
* gfortran.dg/gomp/target2.f90: Moved dg-options except -fopenmp to
dg-additional-options.
* gfortran.dg/gomp/target3.f90: Remove dg-options.
---
 gcc/testsuite/c-c++-common/gomp/clauses-1.c  | 1 -
 gcc/testsuite/c-c++-common/gomp/if-1.c   | 1 -
 gcc/testsuite/c-c++-common/gomp/pr61486-2.c  | 1 -
 gcc/testsuite/c-c++-common/gomp/target-teams-1.c | 2 +-
 gcc/testsuite/g++.dg/gomp/gomp.exp   | 2 +-
 gcc/testsuite/g++.dg/gomp/target-teams-1.C   | 2 +-
 gcc/testsuite/gcc.dg/gomp/gomp.exp   | 2 +-
 gcc/testsuite/gcc.dg/gomp/pr68128-2.c| 2 +-
 gcc/testsuite/gfortran.dg/gomp/gomp.exp  | 2 +-
 gcc/testsuite/gfortran.dg/gomp/target1.f90   | 1 -
 gcc/testsuite/gfortran.dg/gomp/target2.f90   | 2 +-
 gcc/testsuite/gfortran.dg/gomp/target3.f90   | 1 -
 12 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/gomp/clauses-1.c 
b/gcc/testsuite/c-c++-common/gomp/clauses-1.c
index 2d1c352..91aed39 100644
--- a/gcc/testsuite/c-c++-common/gomp/clauses-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/clauses-1.c
@@ -1,5 +1,4 @@
 /* { dg-do compile } */
-/* { dg-options "-fopenmp" } */
 /* { dg-additional-options "-std=c99" { target c } } */
 
 int t;
diff --git a/gcc/testsuite/c-c++-common/gomp/if-1.c 
b/gcc/testsuite/c-c++-common/gomp/if-1.c
index 4ba708c..3a9b538 100644
--- a/gcc/testsuite/c-c++-common/gomp/if-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/if-1.c
@@ -1,5 +1,4 @@
 /* { dg-do compile } */
-/* { dg-options "-fopenmp" } */
 
 void
 foo (int a, int b, int *p, int *q)
diff --git a/gcc/testsuite/c-c++-common/gomp/pr61486-2.c 
b/gcc/testsuite/c-c++-common/gomp/pr61486-2.c
index db97143..4a68023 100644
--- a/gcc/testsuite/c-c++-common/gomp/pr61486-2.c
+++ b/gcc/testsuite/c-c++-common/gomp/pr61486-2.c
@@ -1,6 +1,5 @@
 /* PR middle-end/61486 */
 /* { dg-do compile } */
-/* { dg-options "-fopenmp" } */
 /* { dg-require-effective-target alloca } */
 
 #pragma omp declare target
diff --git a/gcc/testsuite/c-c++-common/gomp/target-teams-1.c 
b/gcc/testsuite/c-c++-common/gomp/target-teams-1.c
index 0a707c2..51b8d48 100644
--- a/gcc/testsuite/c-c++-common/gomp/target-teams-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/target-teams-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-fopenmp -fdump-tree-gimple" } */
+/* { dg-additional-options "-fdump-tree-gimple" } */
 
 int v = 6;
 void bar (int);
diff --git a/gcc/testsuite/g++.dg/gomp/gomp.exp 
b/gcc/testsuite/g++.dg/gomp/gomp.exp
index 7365389..d26596c 100644
--- a/gcc/testsuite/g++.dg/gomp/gomp.exp
+++ b/gcc/testsuite/g++.dg/gomp/gomp.exp
@@ -29,7 +29,7 @@ dg-init
 # Main loop.
 g++-dg-runtest [lsort [concat \
[find $srcdir/$subdir *.C] \
-   [find $srcdir/c-c++-common/gomp *.c]]] "" "-fopenmp"
+   [find $srcdir/c-c++-common/gomp *.c]]] "" "-fopenmp -Wno-hsa"
 
 # All done.
 dg-finish
diff --git a/gcc/testsuite/g++.dg/gomp/target-teams-1.C 
b/gcc/testsuite/g++.dg/gomp/target-teams-1.C
index 0a97de0..f78a608 100644
--- a/gcc/testsuite/g++.dg/gomp/target-teams-1.C
+++ b/gcc/testsuite/g++.dg/gomp/target-teams-1.C
@@ -1,5 +1,5 @@
 // { dg-do compile }
-// { dg-options "-fopenmp -fdump-tree-gimple" }
+// { dg-additional-options "-fdump-tree-gimple" }
 
 int v = 6;
 void bar (int);
diff --git a/gcc/testsuite/gcc.dg/gomp/gomp.exp 
b/gcc/testsuite/gcc.dg/gomp/gomp.exp
index 78623fc..b6b5932 100644
--- a/gcc/testsuit

Re: [RFC][PR69708] IPA inline not working for function reference in static const struc

2016-03-10 Thread Martin Jambor
rough jump
functions, as shown by xfailing ipcp-cstagg-7.c testcase.  To fix
that, we'd either have to force propagation of aggregate values from
constant globals even through jump functions that have agg_preserved
flag cleared, or, and I think this is perhaps a better idea, rethink
the whole approach, give up creating aggregate jump functions and
instead use normal scalar propagation (even for non-scalar types, if
they are exact copies of a read-only aggregate) and change the
consumers so that they use the static initializers to look up the
value.  This would also have the added advantage that parameter
PARAM_IPA_MAX_AGG_ITEMS would not be an issue.

So he current effort below is basically only for reference, hopefully
we'll be able to implement the second approach at some point during
stage1.

Thanks for raising this issue,

Martin

2016-03-10  Martin Jambor  

* ipa-prop.c (count_constants_in_agg_constructor): New function.
(build_agg_jump_func_from_constructor): Likewise.
(determine_locally_known_aggregate_parts): Use thwem to process
global constant variables.
(parm_preserved_before_stmt_p): Return true for loads from
TREE_READONLY parameters.

testsuite/
* gcc.dg/ipa/ipcp-cstagg-1.c: New test.
* gcc.dg/ipa/ipcp-cstagg-2.c: Likewise.
* gcc.dg/ipa/ipcp-cstagg-3.c: Likewise.
* gcc.dg/ipa/ipcp-cstagg-4.c: Likewise.
* gcc.dg/ipa/ipcp-cstagg-5.c: Likewise.
* gcc.dg/ipa/ipcp-cstagg-6.c: Likewise.
* gcc.dg/ipa/ipcp-cstagg-7.c: Likewise.
---
 gcc/ipa-prop.c   | 141 +++
 gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-1.c |  32 +++
 gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-2.c |  39 +
 gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-3.c |  37 
 gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-4.c |  39 +
 gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-5.c |  59 +
 gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-6.c |  81 ++
 gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-7.c |  46 ++
 8 files changed, 474 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-1.c
 create mode 100644 gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-2.c
 create mode 100644 gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-3.c
 create mode 100644 gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-4.c
 create mode 100644 gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-5.c
 create mode 100644 gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-6.c
 create mode 100644 gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-7.c

diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index d62c704..e0bc307 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -803,6 +803,11 @@ parm_preserved_before_stmt_p (struct ipa_func_body_info 
*fbi, int index,
   bool modified = false;
   ao_ref refd;
 
+  tree base = get_base_address (parm_load);
+  gcc_assert (TREE_CODE (base) == PARM_DECL);
+  if (TREE_READONLY (base))
+return true;
+
   /* FIXME: FBI can be NULL if we are being called from outside
  ipa_node_analysis or ipcp_transform_function, which currently happens
  during inlining analysis.  It would be great to extend fbi's lifetime and
@@ -1395,6 +1400,121 @@ build_agg_jump_func_from_list (struct 
ipa_known_agg_contents_list *list,
 }
 }
 
+/* Return how many interprocedural scalar invariants there are in a static
+   CONSTRUCTOR of a variable.  */
+
+static unsigned
+count_constants_in_agg_constructor (tree constructor)
+{
+  unsigned res = 0, max = PARAM_VALUE (PARAM_IPA_MAX_AGG_ITEMS);
+  unsigned ix;
+  tree index, val;
+  FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (constructor), ix, index, val)
+{
+  if (TREE_CODE (TREE_TYPE (constructor)) == RECORD_TYPE
+ && !index)
+   /* We cannot handle field elements that do not have the field decl as
+  its index.  */
+   continue;
+  if (is_gimple_reg_type (TREE_TYPE (val))
+ && is_gimple_ip_invariant (val))
+   res++;
+  else if (TREE_CODE (val) == CONSTRUCTOR)
+   res += count_constants_in_agg_constructor (val);
+
+  if (res > max)
+   return max;
+}
+  return res;
+}
+
+/* Push invariants from static constructor of a global variable into JFUNC's
+   aggregate jump function.  BASE_OFFSET is the offset which should be added to
+   offset of each value.  It can be negative to represent that only a part of
+   an aggregate starting at BASE_OFFSET is being passed as an actual
+   argument.  */
+
+static void
+build_agg_jump_func_from_constructor (tree constructor,
+ HOST_WIDE_INT base_offset,
+ struct ipa_jump_func *jfunc)
+{
+  tree type = TREE_TYPE (constructor);
+  if (TREE_CODE (type) != ARRAY_TYPE
+  && TREE_CODE (type) != RECORD_TYPE)
+return;
+
+  unsigned ix;
+  tree index, val;
+  FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (constructor), ix, index, val)
+{
+  if (!jfunc->agg.items->space (1))
+  

[omp] Create openmp -fopt-info optimization group

2016-03-19 Thread Martin Jambor
Hi,

the following patch does two things.  First, it creates a new optinfo
group for OpenMP and moves OpenMP lowering and expansion to this
group.  Second, it changes all gridification MSG_NOTE dumps to
MSG_MISSED_OPTIMIZATION, which is more appropriate.  (Apparently, I
remembered to change the dump about performed gridification to
MSG_OPTIMIZED_LOCATIONS last autumn but failed to do it for dumps with
failure reasons).

With these changes, users that configured their compiler with HSA can
use (for example) the -fopt-info-all-openmp option to get information
about which target constructs have been gridified and which were not:

  mjambor@virgil:~/gcc/hsa/tests/grid$ ~/gcc/hsa/inst/bin/gcc -fopenmp -O 
combined-hsa.c -fopt-info-all-openmp
  combined-hsa.c:9:9: note: Target construct will be turned into a gridified 
GPGPU kernel

or

  
/home/mjambor/gcc/hsa/src/libgomp/testsuite/libgomp.c/examples-4/target_data-3.c:50:10:
 note: Will not turn target construct into a simple GPGPU kernel because it 
does not have a sole teams construct in it.

and so forth.

I have bootstrapped and tested the patch on x86_64-linux (with and
without configured HSA) and by running make info and examining the
generated info files.  Since it is only a dumping change, I'd like to
propose it for trunk even at this late stage.  If release managers
however do not think it is desirable, I'll commit it to the hsa branch
and propose to trunk again once stage1 opens.

Thanks,

Martin


2016-03-14  Martin Jambor  

* doc/invoke.texi (-fopt-info): Document openmp optimization group.
* doc/optinfo.texi (Optimization groups): Document OPTGROUP_OPENMP.
* dumpfile.c (optgroup_options): Add entry for OpenMP optimizations.
* dumpfile.h (OPTGROUP_OPENMP): New define.
* omp-low.c (pass_data_expand_omp): Change optinfo_flags to
OPTGROUP_OPENMP.
(pass_data_expand_omp_ssa): Likewise.
(pass_data_lower_omp): Likewise.
(pass_data_omp_simd_clone): Likewise.
(grid_find_single_omp_among_assignments_1): Changed all occurrences of
MSG_NOTE to MSG_MISSED_OPTIMIZATION.
(grid_find_single_omp_among_assignments): Likewise.
(grid_target_follows_gridifiable_pattern): Likewise.
---
 gcc/doc/invoke.texi  |  2 ++
 gcc/doc/optinfo.texi |  3 +++
 gcc/dumpfile.c   |  1 +
 gcc/dumpfile.h   |  3 ++-
 gcc/omp-low.c| 56 ++--
 5 files changed, 36 insertions(+), 29 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 99ac11b..5c798a4 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12194,6 +12194,8 @@ Enable dumps from all interprocedural optimizations.
 Enable dumps from all loop optimizations.
 @item inline
 Enable dumps from all inlining optimizations.
+@item openmp
+Enable dumps from OpenMP optimizations.
 @item vec
 Enable dumps from all vectorization optimizations.
 @item optall
diff --git a/gcc/doc/optinfo.texi b/gcc/doc/optinfo.texi
index 3c8fdba..20ca560 100644
--- a/gcc/doc/optinfo.texi
+++ b/gcc/doc/optinfo.texi
@@ -59,6 +59,9 @@ Loop optimization passes. Enabled by @option{-loop}.
 @item OPTGROUP_INLINE
 Inlining passes. Enabled by @option{-inline}.
 
+@item OPTGROUP_OPENMP
+OpenMP passes. Enabled by @option{-openmp}.
+
 @item OPTGROUP_VEC
 Vectorization passes. Enabled by @option{-vec}.
 
diff --git a/gcc/dumpfile.c b/gcc/dumpfile.c
index 144e371..f2430f3 100644
--- a/gcc/dumpfile.c
+++ b/gcc/dumpfile.c
@@ -136,6 +136,7 @@ static const struct dump_option_value_info 
optgroup_options[] =
   {"ipa", OPTGROUP_IPA},
   {"loop", OPTGROUP_LOOP},
   {"inline", OPTGROUP_INLINE},
+  {"openmp", OPTGROUP_OPENMP},
   {"vec", OPTGROUP_VEC},
   {"optall", OPTGROUP_ALL},
   {NULL, 0}
diff --git a/gcc/dumpfile.h b/gcc/dumpfile.h
index c168cbf..72f696b 100644
--- a/gcc/dumpfile.h
+++ b/gcc/dumpfile.h
@@ -97,7 +97,8 @@ enum tree_dump_index
 #define OPTGROUP_LOOP(1 << 2)   /* Loop optimization passes */
 #define OPTGROUP_INLINE  (1 << 3)   /* Inlining passes */
 #define OPTGROUP_VEC (1 << 4)   /* Vectorization passes */
-#define OPTGROUP_OTHER   (1 << 5)   /* All other passes */
+#define OPTGROUP_OPENMP  (1 << 5)  /* OpenMP specific transformations */
+#define OPTGROUP_OTHER   (1 << 6)   /* All other passes */
 #define OPTGROUP_ALL(OPTGROUP_IPA | OPTGROUP_LOOP | OPTGROUP_INLINE \
   | OPTGROUP_VEC | OPTGROUP_OTHER)
 
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 82dec9d..6f42717 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -13990,7 +13990,7 @@ const pass_data pass_data_expand_omp =
 {
   GIMPLE_PASS, /* type */
   "ompexp", /* name */
-  OPTGROUP_NONE, /* optinfo_flags */
+  OPTGROUP_OPENMP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_gimple_any, /* properties_require

[hsa branch] Use an obstack instead of multiple alloc pools

2016-03-19 Thread Martin Jambor
Hi,

when I started working on expansion to HSAIL almost three years ago, I
decided to allocate memory for most of the structures from various
alloc-pools for reasons that never materialized and the number of
pools later grew to unreasonable numbers.  So after an internal
discussion, Martin Liska wrote the following patch which changes
allocations from the various hsa alloc pools to allocations from one
obstack.  I have just committed the patch to the hsa branch after
testing it.

Thanks,

Martin


2016-03-17  Martin Liska  
Martin Jambor 

* hsa-gen.c (hsa_allocp_operand_address): Removed.
(hsa_allocp_operand_immed): Likewise.
(hsa_allocp_operand_reg): Likewise.
(hsa_allocp_operand_code_list): Likewise.
(hsa_allocp_operand_operand_list): Likewise.
(hsa_allocp_inst_basic): Likewise.
(hsa_allocp_inst_phi): Likewise.
(hsa_allocp_inst_mem): Likewise.
(hsa_allocp_inst_atomic): Likewise.
(hsa_allocp_inst_signal): Likewise.
(hsa_allocp_inst_seg): Likewise.
(hsa_allocp_inst_cmp): Likewise.
(hsa_allocp_inst_br): Likewise.
(hsa_allocp_inst_sbr): Likewise.
(hsa_allocp_inst_call): Likewise.
(hsa_allocp_inst_arg_block): Likewise.
(hsa_allocp_inst_comment): Likewise.
(hsa_allocp_inst_queue): Likewise.
(hsa_allocp_inst_srctype): Likewise.
(hsa_allocp_inst_packed): Likewise.
(hsa_allocp_inst_cvt): Likewise.
(hsa_allocp_inst_alloca): Likewise.
(hsa_allocp_bb): Likewise.
(hsa_obstack): New.
(hsa_init_data_for_cfun): Initialize obstack.
(hsa_deinit_data_for_cfun): Release memory of the obstack.
(hsa_op_immed::operator new): Use obstack instead of
object_allocator.
(hsa_op_reg::operator new): Likewise.
(hsa_op_address::operator new): Likewise.
(hsa_op_code_list::operator new): Likewise.
(hsa_op_operand_list::operator new): Likewise.
(hsa_insn_basic::operator new): Likewise.
(hsa_insn_phi::operator new): Likewise.
(hsa_insn_br::operator new): Likewise.
(hsa_insn_sbr::operator new): Likewise.
(hsa_insn_cmp::operator new): Likewise.
(hsa_insn_mem::operator new): Likewise.
(hsa_insn_atomic::operator new): Likewise.
(hsa_insn_signal::operator new): Likewise.
(hsa_insn_seg::operator new): Likewise.
(hsa_insn_call::operator new): Likewise.
(hsa_insn_arg_block::operator new): Likewise.
(hsa_insn_comment::operator new): Likewise.
(hsa_insn_srctype::operator new): Likewise.
(hsa_insn_packed::operator new): Likewise.
(hsa_insn_cvt::operator new): Likewise.
(hsa_insn_alloca::operator new): Likewise.
(hsa_init_new_bb): Likewise.
---
 gcc/hsa-gen.c | 227 ++
 1 file changed, 68 insertions(+), 159 deletions(-)

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index f66eb53..36bc52d 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -38,7 +38,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "dumpfile.h"
 #include "gimple-pretty-print.h"
 #include "diagnostic-core.h"
-#include "alloc-pool.h"
 #include "gimple-ssa.h"
 #include "tree-phinodes.h"
 #include "stringpool.h"
@@ -125,31 +124,7 @@ struct hsa_queue
   uint64_t id;
 };
 
-/* Alloc pools for allocating basic hsa structures such as operands,
-   instructions and other basic entities.  */
-static object_allocator *hsa_allocp_operand_address;
-static object_allocator *hsa_allocp_operand_immed;
-static object_allocator *hsa_allocp_operand_reg;
-static object_allocator *hsa_allocp_operand_code_list;
-static object_allocator *hsa_allocp_operand_operand_list;
-static object_allocator *hsa_allocp_inst_basic;
-static object_allocator *hsa_allocp_inst_phi;
-static object_allocator *hsa_allocp_inst_mem;
-static object_allocator *hsa_allocp_inst_atomic;
-static object_allocator *hsa_allocp_inst_signal;
-static object_allocator *hsa_allocp_inst_seg;
-static object_allocator *hsa_allocp_inst_cmp;
-static object_allocator *hsa_allocp_inst_br;
-static object_allocator *hsa_allocp_inst_sbr;
-static object_allocator *hsa_allocp_inst_call;
-static object_allocator *hsa_allocp_inst_arg_block;
-static object_allocator *hsa_allocp_inst_comment;
-static object_allocator *hsa_allocp_inst_queue;
-static object_allocator *hsa_allocp_inst_srctype;
-static object_allocator *hsa_allocp_inst_packed;
-static object_allocator *hsa_allocp_inst_cvt;
-static object_allocator *hsa_allocp_inst_alloca;
-static object_allocator *hsa_allocp_bb;
+static struct obstack hsa_obstack;
 
 /* List of pointers to all instructions that come from an object allocator.  */
 static vec  hsa_instructions;
@@ -467,52 +442,7 @@ static void
 hsa_init_data_for_cfun ()
 {
   hsa_init_compilation_unit_data ();
-

Re: [PATCH] Retry to emit global variables in HSA (PR hsa/70234)

2016-03-19 Thread Martin Jambor
Hi,

On Tue, Mar 15, 2016 at 12:59:03PM +0100, Martin Liska wrote:
> Hi.
> 
> As emission of a HSAIL function can fail for various reason (-Whsa),
> we must guarantee that a global variable is declared and at maximum once.
> 
> Following patch does that, patch can survive make check-target-libgomp and
> HSAILAsm is happy with BRIG output of declare_target-5.c source file.
> 
> Currently, I'm running bootstrap on x86_64-linux-gnu.
> Ready to install after if finishes?
> 
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
> 2016-03-15  Martin Liska  
> 
>   PR hsa/70234
>   * hsa-brig.c (emit_function_directives): Mark unemitted
>   global variables for emission.
>   * hsa-gen.c (hsa_symbol::hsa_symbol): Initialize a new flag.
>   (get_symbol_for_decl): Likewise.
>   * hsa.h (struct hsa_symbol): New flag.
> ---
>  gcc/hsa-brig.c |  2 ++
>  gcc/hsa-gen.c  | 22 +++---
>  gcc/hsa.h  |  3 +++
>  3 files changed, 24 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c
> index 2a301be..9b6c0b8 100644
> --- a/gcc/hsa-brig.c
> +++ b/gcc/hsa-brig.c
> @@ -643,6 +643,8 @@ emit_function_directives (hsa_function_representation *f, 
> bool is_declaration)
>if (!f->m_declaration_p)
>  for (int i = 0; f->m_global_symbols.iterate (i, &sym); i++)
>{
> + gcc_assert (!sym->m_emitted_to_brig);
> + sym->m_emitted_to_brig = true;
>   emit_directive_variable (sym);
>   brig_insn_count++;
>}
> diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
> index 5939a57..473d4bd 100644
> --- a/gcc/hsa-gen.c
> +++ b/gcc/hsa-gen.c
> @@ -162,7 +162,7 @@ hsa_symbol::hsa_symbol ()
>  m_directive_offset (0), m_type (BRIG_TYPE_NONE),
>  m_segment (BRIG_SEGMENT_NONE), m_linkage (BRIG_LINKAGE_NONE), m_dim (0),
>  m_cst_value (NULL), m_global_scope_p (false), m_seen_error (false),
> -m_allocation (BRIG_ALLOCATION_AUTOMATIC)
> +m_allocation (BRIG_ALLOCATION_AUTOMATIC), m_emitted_to_brig (false)
>  {
>  }
>  
> @@ -174,7 +174,7 @@ hsa_symbol::hsa_symbol (BrigType16_t type, BrigSegment8_t 
> segment,
>  m_directive_offset (0), m_type (type), m_segment (segment),
>  m_linkage (linkage), m_dim (0), m_cst_value (NULL),
>  m_global_scope_p (global_scope_p), m_seen_error (false),
> -m_allocation (allocation)
> +m_allocation (allocation), m_emitted_to_brig (false)
>  {
>  }
>  
> @@ -880,11 +880,27 @@ get_symbol_for_decl (tree decl)
>gcc_checking_assert (slot);
>if (*slot)
>  {
> +  hsa_symbol *sym = (*slot);
> +
>/* If the symbol is problematic, mark current function also as
>problematic.  */
> -  if ((*slot)->m_seen_error)
> +  if (sym->m_seen_error)
>   hsa_fail_cfun ();
>  
> +  /* PR hsa/70234: If a global variable was marked to be emitted,
> +  but HSAIL generation of a function using the variable fails,
> +  we should retry to emit the variable in context of a different
> +  function.
> +
> +  Iterate elements whether a symbol is already in m_global_symbols
> +  of not.  */
> +  for (unsigned i = 0; i < hsa_cfun->m_global_symbols.length (); i++)
> + if (hsa_cfun->m_global_symbols[i] == sym)
> +   return *slot;
> +
> +  if (is_in_global_vars && !sym->m_emitted_to_brig)
> + hsa_cfun->m_global_symbols.safe_push (sym);
> +

Hopefully the linear search in m_global_symbols never becomes
prohibitively expensive.  But it is only necessary when
is_in_global_vars is true, so at least we could do something like:

  if (is_in_global_vars && !sym->m_emitted_to_brig)
{
  for (unsigned i = 0; i < hsa_cfun->m_global_symbols.length (); i++)
if (hsa_cfun->m_global_symbols[i] == sym)
  return *slot;
hsa_cfun->m_global_symbols.safe_push (sym);
}

OK with that change.  And even though I have seen the bug only on the
hsa branch, commit the fix to trunk too, I think it can happen there
as well.

Thanks a lot,

Martin


Re: [HSA, PATCH] Enhance dump output

2016-03-21 Thread Martin Jambor
Hi,

On Mon, Mar 21, 2016 at 12:14:19PM +0100, Martin Liska wrote:
> Hello.
> 
> Following patch enhances dump output for SBR instructions and
> provides a BRIG offset of HSA symbols. The change does not touch any
> code generation snippet and I hope it can be installed during the stage4?

yes, but...

> 
> Patch can bootstrap on x86_64-linux-gnu and survives
> make check-target-libgomp.
> 
> Ready for trunk?
> Thanks,
> Martin

> From f59542322d584a1c61bfbd0148c90671a89d0593 Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Tue, 15 Mar 2016 11:57:30 +0100
> Subject: [PATCH] HSA: enhance dump output
> 
> gcc/ChangeLog:
> 
> 2016-03-15  Martin Liska  
> 
>   * hsa-dump.c (dump_hsa_insn_1): dump default branch of SBR
>   insns.
>   (dump_hsa_symbol): Dump BRIG offset of hsa_symbols.
> ---
>  gcc/hsa-dump.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/hsa-dump.c b/gcc/hsa-dump.c
> index c5f1f69..ad0c8bf 100644
> --- a/gcc/hsa-dump.c
> +++ b/gcc/hsa-dump.c
> @@ -721,6 +721,10 @@ dump_hsa_symbol (FILE *f, hsa_symbol *symbol)
>  
>if (symbol->m_type & BRIG_TYPE_ARRAY_MASK)
>  fprintf (f, "[%lu]", (unsigned long) symbol->m_dim);
> +
> +

...please remove the added newlines here...

> +  if (symbol->m_directive_offset)
> +fprintf (f, " /* BRIG offset: %u", 
> symbol->m_directive_offset);

...and I think you are missing an ending "*/" in the string you dump.

>  }
>  
>  /* Dump textual representation of HSA IL operand OP to file F.  */
> @@ -929,7 +933,8 @@ dump_hsa_insn_1 (FILE *f, hsa_insn_basic *insn, int 
> *indent)
>   fprintf (f, ", ");
>   }
>  
> -  fprintf (f, "]");
> +  fprintf (f, "] /* default: BB %i */",
> +hsa_bb_for_bb (sbr->m_default_bb)->m_index);

I think I've approved this already?

Thanks,

Martin



Re: [HSA, PATCH] Allocate memory for shadow arg (PR hsa/70337)

2016-03-21 Thread Martin Jambor
Hi,

On Mon, Mar 21, 2016 at 01:49:25PM +0100, Martin Liska wrote:
> Hello.
> 
> Following patch fixes an invalid write in HSA plug-in.
> I've been running bootstrap and regression tests on x86-linux-gnu.
> 
> Ready after it finishes?
> Thanks,
> Martin

> From 2674ceb5fddeaeb26ff87d26a43bddaf40060ea2 Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Mon, 21 Mar 2016 13:34:04 +0100
> Subject: [PATCH] Allocate memory for shadow arg (PR hsa/70337)
> 
> libgomp/ChangeLog:
> 
> 2016-03-21  Martin Liska  
> 
>   PR hsa/70337
>   * plugin/plugin-hsa.c (create_single_kernel_dispatch): Allocate
>   memory for hsa_kernel_runtime * argument.
> ---
>  libgomp/plugin/plugin-hsa.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/libgomp/plugin/plugin-hsa.c b/libgomp/plugin/plugin-hsa.c
> index d888493..36b3cf4 100644
> --- a/libgomp/plugin/plugin-hsa.c
> +++ b/libgomp/plugin/plugin-hsa.c
> @@ -884,9 +884,10 @@ create_single_kernel_dispatch (struct kernel_info 
> *kernel,
>shadow->private_segment_size = kernel->private_segment_size;
>shadow->group_segment_size = kernel->group_segment_size;
>  
> -  status
> -= hsa_memory_allocate (agent->kernarg_region, 
> kernel->kernarg_segment_size,
> -&shadow->kernarg_address);
> +  size_t kernarg_size = kernel->kernarg_segment_size
> ++ sizeof (struct hsa_kernel_runtime *);

This is strange.  The pointer to the shadow data structure is, from
the HSA perspective, a normal kernel argument and therefore should
already be included in the kernel->kernarg_segment_size.  Have you
checked that the values are indeed off?

Martin


> +  status = hsa_memory_allocate (agent->kernarg_region, kernarg_size,
> + &shadow->kernarg_address);
>if (status != HSA_STATUS_SUCCESS)
>  hsa_fatal ("Could not allocate memory for HSA kernel arguments", status);
>  
> -- 
> 2.7.1
> 



Re: [HSA, PATCH] Allocate memory for shadow arg (PR hsa/70337)

2016-03-22 Thread Martin Jambor
On Mon, Mar 21, 2016 at 09:51:27PM +0100, Martin Liska wrote:
> On 03/21/2016 07:23 PM, Martin Jambor wrote:
> >This is strange.  The pointer to the shadow data structure is, from
> >the HSA perspective, a normal kernel argument and therefore should
> >already be included in the kernel->kernarg_segment_size.  Have you
> >checked that the values are indeed off?
> 
> Hi Martin.
> 
> You are right that size of a shadow argument pointer should be
> included in the kernel->kernarg_segment_size. I've been currently
> testing a proper patch which conditionally copies shadow argument.
> 
> Thanks,
> Martin
> 

> From 413707c51bf4b0ac7f8dac6421be9955c18767dd Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Mon, 21 Mar 2016 21:40:03 +0100
> Subject: [PATCH] Copy shadow argument conditionally (PR hsa/70337)
> 
> libgomp/ChangeLog:
> 
> 2016-03-21  Martin Liska  
> 
>   PR hsa/70337
>   * plugin/plugin-hsa.c (GOMP_OFFLOAD_run): Copy shadow
>   argument just in case a dispatched kernel uses that argument.

This is OK, thanks,

Martin

> ---
>  libgomp/plugin/plugin-hsa.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/libgomp/plugin/plugin-hsa.c b/libgomp/plugin/plugin-hsa.c
> index d888493..f7ef600 100644
> --- a/libgomp/plugin/plugin-hsa.c
> +++ b/libgomp/plugin/plugin-hsa.c
> @@ -1255,8 +1255,16 @@ GOMP_OFFLOAD_run (int n, void *fn_ptr, void *vars, 
> void **args)
>hsa_signal_store_relaxed (s, 1);
>memcpy (shadow->kernarg_address, &vars, sizeof (vars));
>  
> -  memcpy (shadow->kernarg_address + sizeof (vars), &shadow,
> -   sizeof (struct hsa_kernel_runtime *));
> +  /* PR hsa/70337.  */
> +  size_t vars_size = sizeof (vars);
> +  if (kernel->kernarg_segment_size > vars_size)
> +{
> +  if (kernel->kernarg_segment_size != vars_size
> +   + sizeof (struct hsa_kernel_runtime *))
> + GOMP_PLUGIN_fatal ("Kernel segment size has an unexpected value");
> +  memcpy (packet->kernarg_address + vars_size, &shadow,
> +   sizeof (struct hsa_kernel_runtime *));
> +}
>  
>HSA_DEBUG ("Copying kernel runtime pointer to kernarg_address\n");
>  
> -- 
> 2.7.1
> 



Re: [PATCH] Properly assign to packet header (PR hsa/70394)

2016-03-24 Thread Martin Jambor
Hi,

On Thu, Mar 24, 2016 at 12:48:34PM +0100, Martin Liska wrote:
> Hello.
> 
> Following patch initializes whole packet->header field, which is eventually 
> stored
> to a packet in atomic manner. The function mechanism was adopted from the HSA 
> runtime
> manual.
> 
> I've been running bootstrap and regression tests.
> Ready to be installed after it finishes?
> 
> Thanks,
> Martin
> 
> libgomp/ChangeLog:
> 
> 2016-03-24  Martin Liska  
> 
>   * plugin/plugin-hsa.c (packet_store_release): New function
>   that is taken from the HSA runtime manual.
>   (GOMP_OFFLOAD_run): Use the function.

OK, thanks,

Martin



Re: [PATCH 1/4, libgomp] Resolve deadlock on plugin exit

2016-03-24 Thread Martin Jambor
Hi,

On Mon, Mar 21, 2016 at 06:21:02PM +0800, Chung-Lin Tang wrote:
> Hi, this is the set of patches from 
> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01411.html
> revised again, this time also with audits for the HSA plugin.
> 
> The changes are pretty minor, mainly that the unload_image hook now
> receives similar error handling treatment.
> 
> Tested again without regressions for nvptx and intelmic, however
> while I was able to build the toolchain with HSA offloading support, I was
> unsure how I could test it, as I currently don't have any AMD hardware (not
> aware if there's an emulator like intelmic).  I would be grateful if
> the HSA folks can run them for me.

I have just tested the whole patch-set on my HSA box (i.e. gomp.exp
tests and all libgomp tests on trunk + some extra testing on the hsa
branch) and found no issues.

I have had a very superficial look over the patch and have no
objections but since I am not familiar with the issue this addresses
and because I do not have detailed understanding of the of internals
of copying data to/from devices, my opinion should not really count
much.

Nevertheless, thanks for thinking about HSA and making me aware of the
change,

Martin


> 
> Thanks,
> Chung-Lin
> 
> ChangeLog for the libgomp proper parts, patch as attached.
> 
> 2016-03-20  Chung-Lin Tang  
> 
> * target.c (gomp_device_copy): New function.
> (gomp_copy_host2dev): Likewise.
> (gomp_copy_dev2host): Likewise.
> (gomp_free_device_memory): Likewise.
> (gomp_map_vars_existing): Adjust to call gomp_copy_host2dev().
> (gomp_map_pointer): Likewise.
> (gomp_map_vars): Adjust to call gomp_copy_host2dev(), handle
> NULL value from alloc_func plugin hook.
> (gomp_unmap_tgt): Adjust to call gomp_free_device_memory().
> (gomp_copy_from_async): Adjust to call gomp_copy_dev2host().
> (gomp_unmap_vars): Likewise.
> (gomp_update): Adjust to call gomp_copy_dev2host() and
> gomp_copy_host2dev() functions.
> (gomp_unload_image_from_device): Handle false value from
> unload_image_func plugin hook.
> (gomp_init_device): Handle false value from init_device_func
> plugin hook.
> (gomp_exit_data): Adjust to call gomp_copy_dev2host().
> (omp_target_free): Adjust to call gomp_free_device_memory().
> (omp_target_memcpy): Handle return values from host2dev_func,
> dev2host_func, and dev2dev_func plugin hooks.
> (omp_target_memcpy_rect_worker): Likewise.
> (gomp_target_fini): Handle false value from fini_device_func
> plugin hook.
> * libgomp.h (struct gomp_device_descr): Adjust return type of
> init_device_func, fini_device_func, unload_image_func, free_func,
> dev2host_func,host2dev_func, and dev2dev_func plugin hooks to 'bool'.
> * oacc-host.c (host_init_device): Change return type to bool.
> (host_fini_device): Likewise.
> (host_unload_image): Likewise.
> (host_free): Likewise.
> (host_dev2host): Likewise.
> (host_host2dev): Likewise.
> * oacc-mem.c (acc_free): Handle plugin hook fatal error case.
> (acc_memcpy_to_device): Likewise.
> (acc_memcpy_from_device): Likewise.
> (delete_copyout): Add libfnname parameter, handle free_func
> hook fatal error case.
> (acc_delete): Adjust delete_copyout call.
> (acc_copyout): Likewise.
> (update_dev_host): Move gomp_mutex_unlock to after
> host2dev/dev2host hook calls.
> 


Re: [PATCH 3/4, libgomp] Resolve deadlock on plugin exit, HSA plugin parts

2016-03-24 Thread Martin Jambor
Hi,

On Mon, Mar 21, 2016 at 06:22:17PM +0800, Chung-Lin Tang wrote:
> Hi Martin, I think you're the one to CC for this,
> as I mentioned in the first email, this has been build tested, however I did
> not know if I could test this without a Radeon card.  If convenient,
> could you or anyone familiar with the setup do a make check-target-libgomp
> with this patch series?
> 
> Thanks,
> Chung-Lin
> 
> 
> * plugin/plugin-hsa.c (hsa_warn): Adjust 'hsa_error' local variable
> to 'hsa_error_msg', for clarity.
> (hsa_fatal): Likewise.
> (hsa_error): New function.
> (init_hsa_context): Change return type to bool, adjust to return
> false on error.
> (queue_callback): Adjust to call hsa_error.
> (GOMP_OFFLOAD_get_num_devices): Adjust to handle init_hsa_context
> return value.
> (GOMP_OFFLOAD_init_device): Change return type to bool, adjust to
> return false on error.
> (get_agent_info): Adjust to return NULL on error.
> (destroy_hsa_program): Change return type to bool, adjust to
> return false on error.
> (GOMP_OFFLOAD_load_image): Adjust to return -1 on error.
> (destroy_module): Change return type to bool, adjust to
> return false on error.
> (GOMP_OFFLOAD_unload_image): Likewise.
> (GOMP_OFFLOAD_fini_device): Likewise.
> (GOMP_OFFLOAD_alloc): Change to return NULL when called.
> (GOMP_OFFLOAD_free): Change to return false when called.
> (GOMP_OFFLOAD_dev2host): Likewise.
> (GOMP_OFFLOAD_host2dev): Likewise.
> (GOMP_OFFLOAD_dev2dev): Likewise.

On the whole, I am fine with the patch but there are two issues:

First, and generally, when you change the return type of a function,
you must document what return values mean in the comment of the
function.  Most importantly, it must be immediately apparent whether a
function returns true or false on failure from its comment.  So please
fix that.

Second...

> Index: libgomp/plugin/plugin-hsa.c
> ===
> --- libgomp/plugin/plugin-hsa.c   (revision 234358)
> +++ libgomp/plugin/plugin-hsa.c   (working copy)
> @@ -175,10 +175,10 @@ hsa_warn (const char *str, hsa_status_t status)
>if (!debug)
>  return;
>  
> -  const char *hsa_error;
> -  hsa_status_string (status, &hsa_error);
> +  const char *hsa_error_msg;
> +  hsa_status_string (status, &hsa_error_msg);
>  
> -  fprintf (stderr, "HSA warning: %s\nRuntime message: %s", str, hsa_error);
> +  fprintf (stderr, "HSA warning: %s\nRuntime message: %s", str, 
> hsa_error_msg);
>  }
>  
>  /* Report a fatal error STR together with the HSA error corresponding to 
> STATUS
> @@ -187,12 +187,25 @@ hsa_warn (const char *str, hsa_status_t status)
>  static void
>  hsa_fatal (const char *str, hsa_status_t status)
>  {
> -  const char *hsa_error;
> -  hsa_status_string (status, &hsa_error);
> +  const char *hsa_error_msg;
> +  hsa_status_string (status, &hsa_error_msg);
>GOMP_PLUGIN_fatal ("HSA fatal error: %s\nRuntime message: %s", str,
> -  hsa_error);
> +  hsa_error_msg);
>  }
>  
> +/* Like hsa_fatal, except only report error message, and return FALSE
> +   for propagating error processing to outside of plugin.  */
> +
> +static bool
> +hsa_error (const char *str, hsa_status_t status)
> +{
> +  const char *hsa_error_msg;
> +  hsa_status_string (status, &hsa_error_msg);
> +  GOMP_PLUGIN_error ("HSA fatal error: %s\nRuntime message: %s", str,
> +  hsa_error_msg);
> +  return false;
> +}
> +
>  struct hsa_kernel_description
>  {
>const char *name;

...

>  /* Callback of dispatch queues to report errors.  */
> @@ -454,7 +471,7 @@ queue_callback (hsa_status_t status,
>   hsa_queue_t *queue __attribute__ ((unused)),
>   void *data __attribute__ ((unused)))
>  {
> -  hsa_fatal ("Asynchronous queue error", status);
> +  hsa_error ("Asynchronous queue error", status);
>  }

...I believe this hunk is wrong.  Errors reported in this way mean
that something is very wrong and generally happen during execution of
code on HSA GPU, i.e. within GOMP_OFFLOAD_run.  And since you left
calls in create_single_kernel_dispatch, which is called as a part of
GOMP_OFFLOAD_run, intact, I believe you actually want to leave
hsa_fatel here too.

Thanks,

Martin


Re: [PATCH 3/4, libgomp] Resolve deadlock on plugin exit, HSA plugin parts

2016-03-29 Thread Martin Jambor
Hi,

On Sun, Mar 27, 2016 at 06:26:29PM +0800, Chung-Lin Tang wrote:
> On 2016/3/25 上午 02:40, Martin Jambor wrote:
> > On the whole, I am fine with the patch but there are two issues:
> > 
> > First, and generally, when you change the return type of a function,
> > you must document what return values mean in the comment of the
> > function.  Most importantly, it must be immediately apparent whether a
> > function returns true or false on failure from its comment.  So please
> > fix that.
> 
> Thanks, I'll update on that.
> 
> >> >  /* Callback of dispatch queues to report errors.  */
> >> > @@ -454,7 +471,7 @@ queue_callback (hsa_status_t status,
> >> >  hsa_queue_t *queue __attribute__ ((unused)),
> >> >  void *data __attribute__ ((unused)))
> >> >  {
> >> > -  hsa_fatal ("Asynchronous queue error", status);
> >> > +  hsa_error ("Asynchronous queue error", status);
> >> >  }
> > ...I believe this hunk is wrong.  Errors reported in this way mean
> > that something is very wrong and generally happen during execution of
> > code on HSA GPU, i.e. within GOMP_OFFLOAD_run.  And since you left
> > calls in create_single_kernel_dispatch, which is called as a part of
> > GOMP_OFFLOAD_run, intact, I believe you actually want to leave
> > hsa_fatel here too.
> 
> Yes, a fatal exit is okay within the 'run' hook, since we're not holding
> the device lock there. I was only trying to audit the 
> GOMP_OFFLOAD_init_device()
> function, where the queues are created.
> 
> I'm not familiar with the HSA runtime API; will the callback only be triggered
> during GPU kernel execution (inside the 'run' hook), and not for example,
> within hsa_queue_create()? If so, then yes as you advised, the above change to
> queue_callback() should be reverted.
> 

The documentation says the callback is "invoked by the HSA runtime for
every asynchronous event related to the newly created queue."  All
enumerated situations when the callback is called happen at command
launch time (i.e. inside a run hook).

Since creation of the queue is a synchronous event, callback should
not be invoked if it fails.  But of course, the description does not
rule out such failures do not occur out of the blue at any arbitrary
time.  But I think this is as improbable as an GOMP_PLUGIN_malloc
ending up in a fatal error, which is something you do not seem to be
worried about.

So please revert the hunk.

Thanks,

Martin


Re: [PATCH 1/2] HSA: support alignment for hsa_symbols (PR hsa/70391)

2016-03-31 Thread Martin Jambor
Hi,

this is OK with one small adjustments in a comment:

On Tue, Mar 22, 2016 at 03:51:53PM +0100, Martin Liska wrote:
> gcc/ChangeLog:
> 
> 2016-03-23  Martin Liska  
> 
>   PR hsa/70391
>   * hsa-brig.c (emit_directive_variable): Emit alignment
>   according to hsa_symbol::m_align.
>   * hsa-dump.c (hsa_byte_alignment): Move the function to
>   another file.
>   (dump_hsa_symbol): Dump alignment of HSA symbols.
>   * hsa-gen.c (get_symbol_for_decl): Set-up alignment
>   of a symbol.
>   (gen_hsa_addr_with_align): New function.
>   (hsa_bitmemref_alignment): Use newly added function.
>   (gen_hsa_insns_for_load): Likewise.
>   (gen_hsa_insns_for_store): Likewise.
>   (gen_hsa_memory_copy): New argument added.
>   (gen_hsa_insns_for_single_assignment): Respect
>   alignment for assignments processed via
>   gen_hsa_memory_copy.
>   (gen_hsa_insns_for_direct_call): Likewise.
>   (gen_hsa_insns_for_return): Likewise.
>   (gen_function_def_parameters): Set default
>   alignment.
>   * hsa.c (hsa_object_alignment): New function.
>   (hsa_byte_alignment): Pasted function.
>   * hsa.h (hsa_symbol::m_align): New field.
> ---
>  gcc/hsa-brig.c |  5 +---
>  gcc/hsa-dump.c | 13 ++---
>  gcc/hsa-gen.c  | 88 
> +-
>  gcc/hsa.c  | 20 +
>  gcc/hsa.h  |  8 +-
>  5 files changed, 99 insertions(+), 35 deletions(-)
> 
> diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
> index 72eecf9..db39813 100644
> --- a/gcc/hsa-gen.c
> +++ b/gcc/hsa-gen.c
> @@ -169,12 +169,12 @@ hsa_symbol::hsa_symbol ()
>  
>  hsa_symbol::hsa_symbol (BrigType16_t type, BrigSegment8_t segment,
>   BrigLinkage8_t linkage, bool global_scope_p,
> - BrigAllocation allocation)
> + BrigAllocation allocation, BrigAlignment8_t align)
>: m_decl (NULL_TREE), m_name (NULL), m_name_number (0),
>  m_directive_offset (0), m_type (type), m_segment (segment),
>  m_linkage (linkage), m_dim (0), m_cst_value (NULL),
>  m_global_scope_p (global_scope_p), m_seen_error (false),
> -m_allocation (allocation), m_emitted_to_brig (false)
> +m_allocation (allocation), m_emitted_to_brig (false), m_align (align)
>  {
>  }
>  
> @@ -908,21 +908,29 @@ get_symbol_for_decl (tree decl)
>  {
>hsa_symbol *sym;
>gcc_assert (TREE_CODE (decl) == VAR_DECL);
> +  BrigAlignment8_t align = hsa_object_alignment (decl);
>  
>if (is_in_global_vars)
>   {
> sym = new hsa_symbol (BRIG_TYPE_NONE, BRIG_SEGMENT_GLOBAL,
>   BRIG_LINKAGE_PROGRAM, true,
> - BRIG_ALLOCATION_PROGRAM);
> + BRIG_ALLOCATION_PROGRAM, align);
> hsa_cfun->m_global_symbols.safe_push (sym);
>   }
>else
>   {
> +   /* As generation of memory copy instructions relies on alignment
> +  greater or equal to 8 bytes, we need to increase alignment
> +  of all aggregate types.. */

Let's say "efficient memory copy instructions."  It is of curse
possible to use slower ones.

Thanks,

Martin



Re: [PATCH 2/2] HSA: handle alignment of string builtins (PR hsa/70391)

2016-03-31 Thread Martin Jambor
Hi,

On Wed, Mar 23, 2016 at 02:43:17PM +0100, Martin Liska wrote:
> gcc/ChangeLog:
> 
> 2016-03-23  Martin Liska  
> 
>   PR hsa/70391
>   * hsa-gen.c (hsa_function_representation::update_cfg): New
>   function.
>   (convert_addr_to_flat_segment): Likewise.
>   (gen_hsa_memory_set): New alignment argument.
>   (gen_hsa_ctor_assignment): Likewise.
>   (gen_hsa_insns_for_single_assignment): Provide alignment
>   to gen_hsa_ctor_assignment.
>   (gen_hsa_insns_for_direct_call): Add new argument.
>   (expand_lhs_of_string_op): New function.
>   (expand_string_operation_builtin): Likewise.
>   (expand_memory_copy): New function.
>   (expand_memory_set): New function.
>   (gen_hsa_insns_for_call): Use HOST_WIDE_INT.
>   (convert_switch_statements): Change signature.
>   (generate_hsa): Use a return value of the function.
>   (pass_gen_hsail::execute): Do not call
>   convert_switch_statements here.
>   * hsa-regalloc.c (hsa_regalloc): Call update_cfg.
>   * hsa.h (hsa_function_representation::m_need_cfg_update):
>   New flag.
>   (hsa_function_representation::update_cfg): New function.

As we already discussed, update_cfg and m_need_cfg_update should
really be called differently, because CFG has already been modified
and only dominance needs to be re-computed.  If you havent't thought
about any names yet, what about m_modified_cfg and update_dominance() ?


> ---
>  gcc/hsa-gen.c  | 372 
> ++---
>  gcc/hsa-regalloc.c |   1 +
>  gcc/hsa.h  |   9 +-
>  3 files changed, 275 insertions(+), 107 deletions(-)
> 
> diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
> index db39813..db7fc3d 100644
> --- a/gcc/hsa-gen.c
> +++ b/gcc/hsa-gen.c
> @@ -214,7 +214,7 @@ hsa_symbol::fillup_for_decl (tree decl)
> should be set to number of SSA names used in the function.  */
>  
>  hsa_function_representation::hsa_function_representation
> -  (tree fdecl, bool kernel_p, unsigned ssa_names_count)
> +  (tree fdecl, bool kernel_p, unsigned ssa_names_count, bool need_cfg_update)
>: m_name (NULL),
>  m_reg_count (0), m_input_args (vNULL),
>  m_output_arg (NULL), m_spill_symbols (vNULL), m_global_symbols (vNULL),
> @@ -223,7 +223,8 @@ hsa_function_representation::hsa_function_representation
>  m_in_ssa (true), m_kern_p (kernel_p), m_declaration_p (false),
>  m_decl (fdecl), m_internal_fn (NULL), m_shadow_reg (NULL),
>  m_kernel_dispatch_count (0), m_maximum_omp_data_size (0),
> -m_seen_error (false), m_temp_symbol_count (0), m_ssa_map ()
> +m_seen_error (false), m_temp_symbol_count (0), m_ssa_map (),
> +m_need_cfg_update (need_cfg_update)
>  {
>int sym_init_len = (vec_safe_length (cfun->local_decls) / 2) + 1;;
>m_local_symbols = new hash_table  (sym_init_len);
> @@ -319,6 +320,16 @@ hsa_function_representation::init_extra_bbs ()
>hsa_init_new_bb (EXIT_BLOCK_PTR_FOR_FN (cfun));
>  }
>  
> +void
> +hsa_function_representation::update_cfg ()
> +{
> +  if (m_need_cfg_update)
> +{
> +  free_dominance_info (CDI_DOMINATORS);
> +  calculate_dominance_info (CDI_DOMINATORS);
> +}
> +}
> +
>  hsa_symbol *
>  hsa_function_representation::create_hsa_temporary (BrigType16_t type)
>  {
> @@ -2246,30 +2257,14 @@ gen_hsa_addr_for_arg (tree tree_type, int index)
>return new hsa_op_address (sym);
>  }
>  
> -/* Generate HSA instructions that calculate address of VAL including all
> -   necessary conversions to flat addressing and place the result into DEST.
> +/* Generate HSA instructions that process all necessary conversions
> +   of an ADDR to flat addressing and place the result into DEST.
> Instructions are appended to HBB.  */
>  
>  static void
> -gen_hsa_addr_insns (tree val, hsa_op_reg *dest, hsa_bb *hbb)
> +convert_addr_to_flat_segment (hsa_op_address *addr, hsa_op_reg *dest,
> +   hsa_bb *hbb)
>  {
> -  /* Handle cases like tmp = NULL, where we just emit a move instruction
> - to a register.  */
> -  if (TREE_CODE (val) == INTEGER_CST)
> -{
> -  hsa_op_immed *c = new hsa_op_immed (val);
> -  hsa_insn_basic *insn = new hsa_insn_basic (2, BRIG_OPCODE_MOV,
> -  dest->m_type, dest, c);
> -  hbb->append_insn (insn);
> -  return;
> -}
> -
> -  hsa_op_address *addr;
> -
> -  gcc_assert (dest->m_type == hsa_get_segment_addr_type (BRIG_SEGMENT_FLAT));
> -  if (TREE_CODE (val) == ADDR_EXPR)
> -val = TREE_OPERAND (val, 0);
> -  addr = gen_hsa_addr (val, hbb);
>hsa_insn_basic *insn = new hsa_insn_basic (2, BRIG_OPCODE_LDA);
>insn->set_op (1, addr);
>if (addr->m_symbol && addr->m_symbol->m_segment != BRIG_SEGMENT_GLOBAL)
> @@ -2298,6 +2293,34 @@ gen_hsa_addr_insns (tree val, hsa_op_reg *dest, hsa_bb 
> *hbb)
>  }
>  }
>  
> +/* Generate HSA instructions that calculate address of VAL including all
> +   necessary conversions to flat

Re: [PATCH 2/2] Fix PR hsa/70402

2016-04-01 Thread Martin Jambor
Hi,

On Thu, Mar 31, 2016 at 12:50:54PM +0200, Martin Liska wrote:
> On 03/29/2016 01:44 PM, Martin Liška wrote:
> > Second part of the patch set which omits one split_block (compared to the 
> > original patch).
> > Acceptable just in case the first part will be accepted.
> > 
> > Thanks
> > Martin
> > 
> 
> Hi.
> 
> I'm sending v3 of the patch which does not immediately update dominator,
> but sets a flag that eventually triggers the update.
> 

The patch is OK after you change the name of the flag (introduced in a
different patch) to the new one.

Thanks,

Martin


Re: Splitting up gcc/omp-low.c?

2016-04-08 Thread Martin Jambor
Hi,

On Fri, Apr 08, 2016 at 11:36:03AM +0200, Thomas Schwinge wrote:
> Hi!
> 
> On Thu, 10 Dec 2015 09:08:35 +0100, Jakub Jelinek  wrote:
> > On Wed, Dec 09, 2015 at 06:23:22PM +0100, Bernd Schmidt wrote:
> > > On 12/09/2015 05:24 PM, Thomas Schwinge wrote:
> > > >
> > > >In addition to that, how about we split up gcc/omp-low.c into several
> > > >files?  Would it make sense (I have not yet looked in detail) to do so
> > > >along the borders of the several passes defined therein?  Or, can you
> > > >tell already that there would be too many cross-references between the
> > > >several files to make this infeasible?
> > > 
> > > It would be nice to get rid of all the code duplication in that file. That
> > > alone could reduce the size by quite a bit, and hopefully make it easier 
> > > to
> > > read.
> > 
> > What exact code duplication do you mean?
> 
> (Has been discussed in the following.)  At this point, I do not intend to
> work on any kinds of cleanup, but rather just the "mechanical" changes:
> 
> > > I suspect a split along the ompexp/omplow boundary would be quite easy to
> > > achieve.
> > 
> > Yeah, that might be the possible splitting boundary (have omp-low.c,
> > omp-exp.c).
> 
> Right.  And possibly some kind of omp-simd.c, and omp-checking.c, and so
> on, if feasible.  (I have not yet looked in detail.)
> 
> > > >I'd suggest to do this shortly before GCC 6 is released, so that
> > > >backports from trunk to gcc-6-branch will be easy.  (I assume we don't
> > > >have to care for gcc-5-branch backports too much any longer.)
> > > 
> > > I'll declare myself agnostic as to whether such a change is appropriate 
> > > for
> > > gcc-6 at this stage. I guess it kind of depends on the specifics.
> > 
> > Certainly.  On one side I'd say it is too late now in stage3, on the other
> > side when would be better time to do that, during stage1 people will have
> > more likely out of the tree branches with more changes (I'm aware we even
> > now have the HSA, OpenMP -> PTX and OpenACC branches).
> > 
> > So, if somebody wants to try that, we can see if the result would be
> > appropriate.
> 
> So, has time now come to execute this task?  (To remind: the idea
> explicitly has been to do this late, shortly before the gcc-6-branch gets
> created, to make it easy in the following months to apply patches to both
> trunk and gcc-6-branch.)
> 

Unless someone is quicler, I can give it a go next Thursday (not any
sooner, unfortunately).  I would do a division into omp-low.c and
omp-exp.c and possibly an omp.c for simple stuff not fitting anywhere
else and perhaps even a separate omp-gridify.c.  Someone else would
have to put stuff into an omp-simd.c, I'm afraid.  But it we can go
about this incrementaly.

Thanks,

Martin


[patch] Avoid an unwanted decl re-map in copy_gimple_seq_and_replace_locals

2016-01-08 Thread Martin Jambor
Hi,

I ran into an ICE when compiling the following function on the HSA branch:

foo (int n, int m, int o, int (*a)[m][o])
{
  int i, j, k;
#pragma omp target teams distribute parallel for shared(a) firstprivate(n, m, 
o) private(i,j,k)
  for (i = 0; i < n; i++)
for (j = 0; j < m; j++)
  for (k = 0; k < o; k++)
a[i][j][k] = i + j + k;
}

The problem is that when I duplicate the loop with
copy_gimple_seq_and_replace_locals, I get one extra re-mapping.
Specifically, I feed the function this:

{
  int i.2;

  #pragma omp teams shared(a) firstprivate(n) firstprivate(m) firstprivate(o) 
shared(m.1) shared(D.3275) shared(o.0)
{
  #pragma omp distribute private(i.2)
  for (i.2 = 0; i.2 < n; i.2 = i.2 + 1)
{
  #pragma omp parallel shared(a) firstprivate(n) firstprivate(m) 
firstprivate(o) private(i) private(j) private(k) shared(m.1) shared(D.3275) 
shared(o.0)
{
  sizetype D.3286;
  long unsigned int D.3287;
  sizetype D.3288;
  sizetype D.3289;
  sizetype D.3290;
  long unsigned int D.3291;
  long unsigned int D.3292;
  int[0:D.3279][0:D.3271] * D.3293;
  int D.3294;
  int D.3295;

  #pragma omp for nowait
  for (i = 0; i < n; i = i + 1)
{
  j = 0;
  goto ;
  :
  k = 0;
  goto ;
  :
  D.3286 = D.3275 /[ex] 4;   <--- here I get wrog decl
  D.3287 = (long unsigned int) i;
  D.3288 = (sizetype) o.0;
  D.3289 = (sizetype) m.1;
  D.3290 = D.3288 * D.3289;
  D.3291 = D.3287 * D.3290;
  D.3292 = D.3291 * 4;
  D.3293 = a + D.3292;
  D.3294 = i + j;
  D.3295 = D.3294 + k;
  *D.3293[j]{lb: 0 sz: D.3286 * 4}[k] = D.3295;
  k = k + 1;
  :
  if (k < o) goto ; else goto ;
  :
  j = j + 1;
  :
  if (j < m) goto ; else goto ;
  :
}
}
}
}
}

and it replaces D.3275 with its new copy with undefined value.  The
mapping is created when an array type where the size is defined in
terms of that variable declaration is copied.  The comment in
type-remapping code says that we "use the already remaped data" but
that is not true.

My solution was to prevent declaration duplication in this case with
yet another state variable in struct copy_body_data that holds a
special value when we are running copy_gimple_seq_and_replace_locals
and another when we are within type-remapping.

I'll be happy for any suggestion how to deal with this without
cluttering copy_body_date even more but so far I have not found any.
If nobody has a better idea, is the following good for trunk?  (I am
about to commit it to the hsa branch.)  It has passed bootstrap and
testing on x86_64-linux.

Thanks,

Martin


2016-01-06  Martin Jambor  

* tree-inline.h (copy_body_data): New field
decl_creation_prevention_level.  Moved remap_var_for_cilk to minimize
padding.
* tree-inline.c (remap_decl): Return original decls if
decl_creation_prevention_level is two or bigger.
(remap_type_1): Increment and decrement decl_creation_prevention_level
if appropriate.
(copy_gimple_seq_and_replace_locals): Set
decl_creation_prevention_level to 1.

diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 88a6753..2df11a2 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -340,8 +340,20 @@ remap_decl (tree decl, copy_body_data *id)
   return decl;
 }
 
-  /* If we didn't already have an equivalent for this declaration,
- create one now.  */
+  /* If decl copying is forbidden (which happens when copying a type with size
+ defined outside of the copied sequence) work with the original decl. */
+  if (!n
+  && id->decl_creation_prevention_level > 1
+  && (VAR_P (decl) || TREE_CODE (decl) == PARM_DECL))
+{
+  if (id->do_not_unshare)
+   return decl;
+  else
+   return unshare_expr (decl);
+}
+
+  /* If we didn't already have an equivalent for this declaration, create one
+ now.  */
   if (!n)
 {
   /* Make a copy of the variable or label.  */
@@ -526,7 +538,10 @@ remap_type_1 (tree type, copy_body_data *id)
   gcc_unreachable ();
 }
 
-  /* All variants of type share the same size, so use the already remaped 
data.  */
+  /* All variants of type share the same size, so use the already remaped
+ data.  */
+  if (id->decl_creation_prevention_level > 0)
+id->decl_creation_prevention_level++;
   if (TYPE_MAIN

[PR ipa/66616] Fix artificial thunk ABI issues

2016-01-08 Thread Martin Jambor
Hi,

i386 -m32 failure of the PR 66616 testcase was caused by the fact
that, on the callee side, the calling conventions of a thunk are
decided according to the properties of the function it is associated
with, but on the caller side, the actual thunk is examined.  Since
they depend on the can_change_signature cgraph_node flag and the flag
of artificial thunks has not been copied from the function, the caller
and callee could disagree on ABI.

Fixed thusly, by copying the flag to the artificial thunk.  Testcase
is already in the testsuite (g++.dg/ipa/pr66616.C).  The patch has
successfully passed bootstrap and testing on i686-linux, I have also
included it in a bootstrap and testing that is underway on
x86_64-linux.  OK if it passes there as well?

Thanks,

Martin


[PR ipa/66616] Copy can_change_signature flag to artificial thunks

2016-01-07  Martin Jambor  

PR ipa/66616
* cgraphclones.c (duplicate_thunk_for_node): Copy can_change_signature
flag.

diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
index f8a7d37..8759ce4 100644
--- a/gcc/cgraphclones.c
+++ b/gcc/cgraphclones.c
@@ -328,6 +328,7 @@ duplicate_thunk_for_node (cgraph_node *thunk, cgraph_node 
*node)
   new_thunk = cgraph_node::create (new_decl);
   set_new_clone_decl_and_node_flags (new_thunk);
   new_thunk->definition = true;
+  new_thunk->local.can_change_signature = node->local.can_change_signature;
   new_thunk->thunk = thunk->thunk;
   new_thunk->unique_name = in_lto_p;
   new_thunk->former_clone_of = thunk->decl;


[PR 69044] Do not clone for parameter removal when !can_change_signature

2016-01-08 Thread Martin Jambor
Hi,

we generally do not have ther ability to propagate constants to and
clone CHKP instrumented functions.  Therefore we do not propagate
stuff into their lattices but since Honza changed cloning for all
contexts heuristics a few weeks ago, we might attempt to clone them
for unused parameter removal, which then leads to an ICE (and all
sorts of issues).

The heuristics however should not attempt to do that because the
function cgraph_node has can_change_signature flag cleared.  So this
patch changes it accordingly.  Bootstrapped and tested on
x86_64-linux, OK for trunk?

Thanks,

Martin


2016-01-08  Martin Jambor  

PR ipa/69044
* ipa-cp.c (estimate_local_effects): Do not clone for removal of
useless parameters if we cannot change function signature.

testsuite/
* gcc.target/i386/chkp-pr69044.c: New test.

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 782df71..d99e69c 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -2518,7 +2518,8 @@ estimate_local_effects (struct cgraph_node *node)
   known_aggs_ptrs = agg_jmp_p_vec_for_t_vec (known_aggs);
   int devirt_bonus = devirtualization_time_bonus (node, known_csts,
   known_contexts, known_aggs_ptrs);
-  if (always_const || devirt_bonus || removable_params_cost)
+  if (always_const || devirt_bonus
+  || (removable_params_cost && node->local.can_change_signature))
 {
   struct caller_statistics stats;
   inline_hints hints;
diff --git a/gcc/testsuite/gcc.target/i386/chkp-pr69044.c 
b/gcc/testsuite/gcc.target/i386/chkp-pr69044.c
new file mode 100644
index 000..933e88a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/chkp-pr69044.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target mpx } */
+/* { dg-options "-fcheck-pointer-bounds -mmpx -O2" } */
+
+int i;
+int strncasecmp (char *p1, char *p2, long p3) { return 0; }
+int special_command ()
+{
+  if (strncasecmp (0, 0, 0))
+i++;
+}


Re: [hsa 2/10] Modifications to libgomp proper

2016-01-12 Thread Martin Jambor
Hi,

On Fri, Dec 11, 2015 at 07:05:29PM +0100, Jakub Jelinek wrote:
> On Thu, Dec 10, 2015 at 06:52:23PM +0100, Martin Jambor wrote:
> > > > --- a/libgomp/task.c
> > > > +++ b/libgomp/task.c
> > > > @@ -581,6 +581,7 @@ GOMP_PLUGIN_target_task_completion (void *data)
> > > >gomp_mutex_unlock (&team->task_lock);
> > > >  }
> > > >ttask->state = GOMP_TARGET_TASK_FINISHED;
> > > > +  free (ttask->firstprivate_copies);
> > > >gomp_target_task_completion (team, task);
> > > >gomp_mutex_unlock (&team->task_lock);
> > > >  }
> > > 
> > > So, this function should have a special case for the SHARED_MEM case, 
> > > handle
> > > it closely to say how GOMP_taskgroup_end handles the finish_cancelled:
> > > case.  Just note that the target task is missing from certain queues at 
> > > that
> > > point.
> > 
> > I'm afraid I need some help here.  I do not quite understand how is
> > finish_cancelled in GOMP_taskgroup_end similar, it seems to be doing
> > much more than freeing one pointer.  What is exactly the issue with
> > the above?
> > 
> > Nevertheless, after reading through bits of task.c again, I wonder
> > whether any copying (for both shared memory target and the host) in
> > gomp_target_task_fn is actually necessary because it seems to be also
> > done in gomp_create_target_task.  Does that not apply somehow?
> 
> The target task is scheduled for the first action as normal task, and the
> scheduling of it already removes it from some of the queues (each task is
> put into 1-3 queues), i.e. actions performed mostly by
> gomp_task_run_pre.  Then the team task lock is unlocked and the task is run.
> Finally, for normal tasks, gomp_task_run_post_handle_depend,
> gomp_task_run_post_remove_parent, etc. is run.  Now, for async target tasks
> that have something running on some other device at that point, we don't do
> that, but instead make it GOMP_TASK_ASYNC_RUNNING.  And continue with other
> stuff, until gomp_target_task_completion is run.
> For non-shared mem that needs to readd the task again into the queues, so
> that it will be scheduled again.  But you don't need that for shared mem
> target tasks, they can just free the firstprivate_copies and finalize the
> task.
> At the time gomp_target_task_completion is called, the task is pretty much
> in the same state as it is around the finish_cancelled:; label.
> So instead of what the gomp_target_task_completion function does,
> you would for SHARED_MEM do something like:
>   size_t new_tasks
> = gomp_task_run_post_handle_depend (task, team);
>   gomp_task_run_post_remove_parent (task);
>   gomp_clear_parent (&task->children_queue);
>   gomp_task_run_post_remove_taskgroup (task);
>   team->task_count--;
> do_wake = 0;
>   if (new_tasks > 1)
> {
>   do_wake = team->nthreads - team->task_running_count
> - !task->in_tied_task;
>   if (do_wake > new_tasks)
> do_wake = new_tasks;
> }
> // Unlike other places, the following will be also run with the
> // task_lock held, but I'm afraid there is nothing to do about it.
> // See the comment in gomp_target_task_completion.
> gomp_finish_task (task);
> free (task);
> if (do_wake)
>   gomp_team_barrier_wake (&team->barrier, do_wake);
> 

I tried the above but libgomp testcase target-33.c always got stuck
within GOMP_taskgroup_end call, more specifically in
gomp_team_barrier_wait_end in config/linux/bar.c where the the first
call to gomp_barrier_handle_tasks left the barrier->generation as
BAR_WAITING_FOR_TASK and then nothing ever happened, even as the
callbacks fired.

After looking into the tasking mechanism for basically the whole day
yesterday, I *think* I fixed it by calling
gomp_team_barrier_set_task_pending from the callback and another hunk
in gomp_barrier_handle_tasks so that it clears that barrier flag even
if it has not picked up any tasks.  Please let me know if you think it
makes sense.

If so, I'll include it in an HSA patch set I hope to generate today.
Otherwise I guess I'd prefer to remove the shared-memory path and
revert to old behavior as a temporary measure until we find out what
was wrong.

Thanks and sorry that it took me so long to resolve this,

Martin


diff --git a/libgomp/task.c b/libgomp/task.c
index ab5df51..828c1fb 100644
--- a/libgomp/task.c
+++ b/libgomp/task.c
@@ -566,6 +566,14 @@ gomp_target_task_completion (struct gomp_team *team, 
str

Re: [patch] Avoid an unwanted decl re-map in copy_gimple_seq_and_replace_locals

2016-01-12 Thread Martin Jambor
Hi,

On Mon, Jan 11, 2016 at 05:38:47PM +0100, Jakub Jelinek wrote:
> On Mon, Jan 11, 2016 at 09:41:31AM +0100, Richard Biener wrote:
> > Hum.  Can't you check id->remapping_type_depth?

For some reason, last week I reached the conclusion that no.  But I
must have done something wrong because I have tested it again today
and just never creating a new decl in remap_decl if
id->remapping_type_depth is non zero is good enough for my testcase
and it survives bootstrap and testing too (previously I thought it did
not).

id->remapping_type_depth seems to be incremented for DECL_VALUE_EXPR
id->as well, so it actually might help in that situation too.

> That said, how do
> > we end up recursing into remap_decl when copying the variable length
> > decl/type?  Can't we avoid the recursion (basically avoid remapping
> > variable-size types at all?)

Here I agree with Jakub that there are situations where we have to.
There is a comment towards the end of remap_type_1 saying that when
remapping types, all required decls should have already been mapped.
If that is correct, and I belive it is, the remapping_type_depth test
should be fine.

> 
> I guess it depends, VLA types that refer in their various gimplified
> expressions only to decls defined outside of bind stmts we are duplicating
> are fine as is, they don't need remapping, or could be remapped to VLA types
> that use all the same temporary decls.
> VLAs that have some or all references to decls inside of the bind stmts
> we are duplicating IMHO need to be remapped.
> So, perhaps we need to remap_decls in replace_locals_stmt in two phases
> in presence of VLAs (or also vars with DECL_VALUE_EXPR)

I'm a bit worried what would happen do local DECLs that are pointers
to VLAs, because...

> - phase 1 would just walk the
>   for (old_var = decls; old_var; old_var = DECL_CHAIN (old_var))
> {
>   if (!can_be_nonlocal (old_var, id)
> && ! variably_modified_type_p (TREE_TYPE (old_var), id->src_fn))

...variably_modified_type_p seems to return true for them and...

>   remap_decl (old_var, id);
> }
> - phase 2 - do the full remap_decls, but during that arrange that
>   remap_decl for non-zero id->remapping_type_depth if (!n) just returns
>   decl

...they would not be copied here because remap_decl would not be
duplicating stuff.  So I'd end up with an original local decl when I
actually need a duplicate.

But let me go with just checking the remapping_type_depth for now.

Thanks for looking into this,

Martin


> That way, I think if the types refer to some temporaries that are defined
> in the bind stmts being copied, they will be properly duplicated, otherwise
> they will be shared.
> So, we'd need some flag in *id (just bool bitfield would be enough) that would
> allow replace_locals_stmt to set it before the remap_decls call in phase 2
> and clear it afterwards, and use that flag together with
> id->remapping_type_depth in remap_decls.
> 
>   Jakub


Re: [patch] Avoid an unwanted decl re-map in copy_gimple_seq_and_replace_locals

2016-01-12 Thread Martin Jambor
On Tue, Jan 12, 2016 at 06:36:21PM +0100, Martin Jambor wrote:
> > remap_decl (old_var, id);
> > }
> > - phase 2 - do the full remap_decls, but during that arrange that
> >   remap_decl for non-zero id->remapping_type_depth if (!n) just returns
> >   decl
> 
> ...they would not be copied here because remap_decl would not be
> duplicating stuff.  So I'd end up with an original local decl when I
> actually need a duplicate.
> 

ugh, I'm trying to be too fast and obviously forgot about the
id->remapping_type_depth part of the proposed condition.

Still, when could relying solely on id->remapping_type_depth fail?

Sorry for the noise,

Martin


Re: [hsa 2/10] Modifications to libgomp proper

2016-01-12 Thread Martin Jambor
Hi,

On Tue, Jan 12, 2016 at 02:38:15PM +0100, Jakub Jelinek wrote:
> On Tue, Jan 12, 2016 at 02:29:06PM +0100, Martin Jambor wrote:
> > GOMP_kernel_launch_attributes should not be there (it is a
> > reminiscence from before the device-specific target arguments) and
> > should be moved just to the HSA plugin.  I'll prepare a patch today.
> > 
> > While we do not have to share GOMP_hsa_kernel_dispatch, we actually do
> > use them in both the plugin and the compiler, where we only use it in
> > an offsetof, so that we only have the structure defined once.
> 
> But, even using it in offsetof might be wrong, the compiler could be a
> cross-compiler, and you'd use offsetof on the host, while you want it for
> the target, and that would be different.
> So, IMHO you need (unless you already have) built the structure as a tree
> type, lay it out, and then you can use at TYPE_SIZE_UNIT or
> DECL_FIELD_OFFSET and the like.
> 

I see. For now I have just put a FIXME there but have talked to Martin
about laying out the type properly.  This is what I have committed to
the branch.

Thanks,

Martin

2016-01-12  Martin Jambor  

include/
* gomp-constants.h (GOMP_kernel_launch_attributes): Removed.
(GOMP_hsa_kernel_dispatch): Likewise.

libgomp/
* plugin/plugin-hsa.c (GOMP_kernel_launch_attributes): Moved here.
(GOMP_hsa_kernel_dispatch): Likewise.

gcc/
* hsa-gen.c (GOMP_hsa_kernel_dispatch): Moved here.
---
 gcc/hsa-gen.c   | 35 +
 include/gomp-constants.h| 44 --
 libgomp/plugin/plugin-hsa.c | 47 +
 3 files changed, 82 insertions(+), 44 deletions(-)

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index 1715b57..f633dfd 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -3747,6 +3747,41 @@ gen_set_num_threads (tree value, hsa_bb *hbb)
   hbb->append_insn (basic);
 }
 
+/* Collection of information needed for a dispatch of a kernel from a
+   kernel.  Keep in sync with libgomp's plugin-hsa.c.
+
+   FIXME: In order to support cross-compilations, we need to lay ot the type as
+   a tree and then use field_decl positions.
+ */
+
+struct GOMP_hsa_kernel_dispatch
+{
+  /* Pointer to a command queue associated with a kernel dispatch agent.  */
+  void *queue;
+  /* Pointer to reserved memory for OMP data struct copying.  */
+  void *omp_data_memory;
+  /* Pointer to a memory space used for kernel arguments passing.  */
+  void *kernarg_address;
+  /* Kernel object.  */
+  uint64_t object;
+  /* Synchronization signal used for dispatch synchronization.  */
+  uint64_t signal;
+  /* Private segment size.  */
+  uint32_t private_segment_size;
+  /* Group segment size.  */
+  uint32_t group_segment_size;
+  /* Number of children kernel dispatches.  */
+  uint64_t kernel_dispatch_count;
+  /* Number of threads.  */
+  uint32_t omp_num_threads;
+  /* Debug purpose argument.  */
+  uint64_t debug;
+  /* Levels-var ICV.  */
+  uint64_t omp_level;
+  /* Kernel dispatch structures created for children kernel dispatches.  */
+  struct GOMP_hsa_kernel_dispatch **children_dispatches;
+};
+
 /* Return an HSA register that will contain number of threads for
a future dispatched kernel.  Instructions are added to HBB.  */
 
diff --git a/include/gomp-constants.h b/include/gomp-constants.h
index 1dae474..a8e7723 100644
--- a/include/gomp-constants.h
+++ b/include/gomp-constants.h
@@ -256,48 +256,4 @@ enum gomp_map_kind
 /* Identifiers of device-specific target arguments.  */
 #define GOMP_TARGET_ARG_HSA_KERNEL_ATTRIBUTES  (1 << 8)
 
-/* Structure describing the run-time and grid properties of an HSA kernel
-   lauch.  */
-
-struct GOMP_kernel_launch_attributes
-{
-  /* Number of dimensions the workload has.  Maximum number is 3.  */
-  uint32_t ndim;
-  /* Size of the grid in the three respective dimensions.  */
-  uint32_t gdims[3];
-  /* Size of work-groups in the respective dimensions.  */
-  uint32_t wdims[3];
-};
-
-/* Collection of information needed for a dispatch of a kernel from a
-   kernel.  */
-
-struct GOMP_hsa_kernel_dispatch
-{
-  /* Pointer to a command queue associated with a kernel dispatch agent.  */
-  void *queue;
-  /* Pointer to reserved memory for OMP data struct copying.  */
-  void *omp_data_memory;
-  /* Pointer to a memory space used for kernel arguments passing.  */
-  void *kernarg_address;
-  /* Kernel object.  */
-  uint64_t object;
-  /* Synchronization signal used for dispatch synchronization.  */
-  uint64_t signal;
-  /* Private segment size.  */
-  uint32_t private_segment_size;
-  /* Group segment size.  */
-  uint32_t group_segment_size;
-  /* Number of children kernel dispatches.  */
-  uint64_t kernel_dispatch_count;
-  /* Number of threads.  */
-  uint32_t omp_num_threads;
-  /* Debug purpose argument.  */
-  uint64_t debug;
-  /* Levels-var

[hsa merge 01/10] Configury changes and new options

2016-01-13 Thread Martin Jambor
Hi,

this patch contains changes to the configuration mechanism and offload
bits, so that users can build compilers with HSA support.

It is a re-post of
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00714.html, which, has
already been approved by Jakub after a few changes
(https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01284.html).

thanks,

Martin

2016-01-13  Martin Jambor  

* Makefile.in (OBJS): Add new source files.
(GTFILES): Add hsa.c.
* common.opt (disable_hsa): New variable.
(-Whsa): New warning.
* config.in (ENABLE_HSA): New.
* configure.ac: Treat hsa differently from other accelerators.
(OFFLOAD_TARGETS): Define ENABLE_OFFLOADING according to
$enable_offloading.
(ENABLE_HSA): Define ENABLE_HSA according to $enable_hsa.
* doc/install.texi (Configuration): Document --with-hsa-runtime,
--with-hsa-runtime-include, --with-hsa-runtime-lib and
--with-hsa-kmt-lib.
* doc/invoke.texi (-Whsa): Document.
(hsa-gen-debug-stores): Likewise.
* lto-wrapper.c (compile_images_for_offload_targets): Do not attempt
to invoke offload compiler for hsa acclerator.
* opts.c (common_handle_option): Determine whether HSA offloading
should be performed.
* params.def (PARAM_HSA_GEN_DEBUG_STORES): New parameter.

libgomp/plugin/
* Makefrag.am: Add HSA plugin requirements.
* configfrag.ac (HSA_RUNTIME_INCLUDE): New variable.
(HSA_RUNTIME_LIB): Likewise.
(HSA_RUNTIME_CPPFLAGS): Likewise.
(HSA_RUNTIME_INCLUDE): New substitution.
(HSA_RUNTIME_LIB): Likewise.
(HSA_RUNTIME_LDFLAGS): Likewise.
(hsa-runtime): New configure option.
(hsa-runtime-include): Likewise.
(hsa-runtime-lib): Likewise.
(PLUGIN_HSA): New substitution variable.
Fill HSA_RUNTIME_INCLUDE and HSA_RUNTIME_LIB according to the new
configure options.
(PLUGIN_HSA_CPPFLAGS): Likewise.
(PLUGIN_HSA_LDFLAGS): Likewise.
(PLUGIN_HSA_LIBS): Likewise.
Check that we have access to HSA run-time.

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 44a18eb..ab9cbbf 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1297,6 +1297,11 @@ OBJS = \
graphite-sese-to-poly.o \
gtype-desc.o \
haifa-sched.o \
+   hsa.o \
+   hsa-gen.o \
+   hsa-regalloc.o \
+   hsa-brig.o \
+   hsa-dump.o \
hw-doloop.o \
hwint.o \
ifcvt.o \
@@ -1321,6 +1326,7 @@ OBJS = \
ipa-icf.o \
ipa-icf-gimple.o \
ipa-reference.o \
+   ipa-hsa.o \
ipa-ref.o \
ipa-utils.o \
ipa.o \
@@ -2404,6 +2410,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h 
$(srcdir)/coretypes.h \
   $(srcdir)/sancov.c \
   $(srcdir)/ipa-devirt.c \
   $(srcdir)/internal-fn.h \
+  $(srcdir)/hsa.c \
   @all_gtfiles@
 
 # Compute the list of GT header files from the corresponding C sources,
diff --git a/gcc/common.opt b/gcc/common.opt
index 49d347c..23e6ed7 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -239,6 +239,10 @@ Inserts call to __sanitizer_cov_trace_pc into every basic 
block.
 Variable
 bool dump_base_name_prefixed = false
 
+; Flag whether HSA generation has been explicitely disabled
+Variable
+bool flag_disable_hsa = false
+
 ###
 Driver
 
@@ -593,6 +597,10 @@ Wfree-nonheap-object
 Common Var(warn_free_nonheap_object) Init(1) Warning
 Warn when attempting to free a non-heap object.
 
+Whsa
+Common Var(warn_hsa) Init(1) Warning
+Warn when a function cannot be expanded to HSAIL.
+
 Winline
 Common Var(warn_inline) Warning
 Warn when an inlined function cannot be inlined.
diff --git a/gcc/config.in b/gcc/config.in
index c00cd0f..c3340bb0 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -144,6 +144,12 @@
 #endif
 
 
+/* Define this to enable support for generating HSAIL. */
+#ifndef USED_FOR_TARGET
+#undef ENABLE_HSA
+#endif
+
+
 /* Define if gcc should always pass --build-id to linker. */
 #ifndef USED_FOR_TARGET
 #undef ENABLE_LD_BUILDID
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 0a626e9..8d3a869 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -940,6 +940,13 @@ AC_SUBST(accel_dir_suffix)
 
 for tgt in `echo $enable_offload_targets | sed 's/,/ /g'`; do
   tgt=`echo $tgt | sed 's/=.*//'`
+
+  if echo "$tgt" | grep "^hsa" > /dev/null ; then
+enable_hsa=1
+  else
+enable_offloading=1
+  fi
+
   if test x"$offload_targets" = x; then
 offload_targets=$tgt
   else
@@ -948,7 +955,7 @@ for tgt in `echo $enable_offload_targets | sed 's/,/ /g'`; 
do
 done
 AC_DEFINE_UNQUOTED(OFFLOAD_TARGETS, "$offload_targets",
   [Define to offload targets, separated by commas.])
-if test x"$offload_targets" != x; then
+if test x"$enable_offloading" != x; then
   AC_DEFINE(ENABLE_OFFLOADING, 1,
 [Define this to enable supp

[hsa merge 06/10] Pass manager changes

2016-01-13 Thread Martin Jambor
Hi,

the pass manager changes required for HSA have already been committed
to trunk so all that remains are these additions to the pass pipeline.

This bit has already been approved by Richi in
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00996.html

Thanks,

Martin


2016-01-13  Martin Jambor  
Martin Liska  

* passes.def: Schedule pass_ipa_hsa and pass_gen_hsail.
* tree-pass.h (make_pass_gen_hsail): Declare.
(make_pass_ipa_hsa): Likewise.

diff --git a/gcc/passes.def b/gcc/passes.def
index c593851..a6a1719 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -151,6 +151,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_ipa_cp);
   NEXT_PASS (pass_ipa_cdtor_merge);
   NEXT_PASS (pass_target_clone);
+  NEXT_PASS (pass_ipa_hsa);
   NEXT_PASS (pass_ipa_inline);
   NEXT_PASS (pass_ipa_pure_const);
   NEXT_PASS (pass_ipa_reference);
@@ -388,6 +389,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_nrv);
   NEXT_PASS (pass_cleanup_cfg_post_optimizing);
   NEXT_PASS (pass_warn_function_noreturn);
+  NEXT_PASS (pass_gen_hsail);
 
   NEXT_PASS (pass_expand);
 
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index e8e8e48..b942a01 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -471,6 +471,7 @@ extern gimple_opt_pass *make_pass_sanopt (gcc::context 
*ctxt);
 extern gimple_opt_pass *make_pass_oacc_kernels (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_oacc (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_oacc_kernels (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_gen_hsail (gcc::context *ctxt);
 
 /* IPA Passes */
 extern simple_ipa_opt_pass *make_pass_ipa_lower_emutls (gcc::context *ctxt);
@@ -495,6 +496,7 @@ extern ipa_opt_pass_d *make_pass_ipa_cp (gcc::context 
*ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_icf (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_devirt (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_reference (gcc::context *ctxt);
+extern ipa_opt_pass_d *make_pass_ipa_hsa (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_pure_const (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_pta (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_tm (gcc::context *ctxt);



[hsa merge 00/10] Merge of HSA branch

2016-01-13 Thread Martin Jambor
Hi,

this is hopefully the last big re-post of the HSA patches.  We have
incorporated all the feedback and found and fixed a couple more bugs.
The complete patch-set bootstraps and tests fine on an x86_64-linux,
when you do not enable HSA, there are a few expected warnings when HSA
is enabled which I will address as a followup together with more
testsuite changes.  The patches were specifically designed so that the
impact on pople not enabling HSA should be minimal.  A last round of
complete testing on an actual HSA-capable APU is still underway and I
won't have the results until tomorrow but preliminary results were
good and I di dnot want to hold up these patches for any longer.

The libgomp, omp and configuration bits have been reviewed by Jakub, a
few other bits by Richi, but still Honza should review the IPA parts
and I suppose someone other than me should ack the hsa-* files, even
though I probably now have the authority to do it myself.

Thanks everybody for patience and feedback.  While we are of course
opened for mor more of it, let's also hope the approval process will
finish soon as it should now.

Martin



[hsa merge 02/10] Modifications to libgomp proper

2016-01-13 Thread Martin Jambor
Hi,

The patch below contains all changes to libgomp files except for the
hsa plugin (which is in the following patch).

The patch is a re-post of
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01288.html but we have
incorporated a number of requests from the feedback.  From the
subsequent communications with Jakub, I have the feeling he is fine
with the changes.  But perhaps he or someone else would like to have
one more look.

Thanks,

Martin


2016-01-13  Martin Jambor  

include/
* gomp-constants.h (GOMP_DEVICE_HSA): New macro.
(GOMP_VERSION_HSA): Likewise.
(GOMP_TARGET_ARG_DEVICE_MASK): Likewise.
(GOMP_TARGET_ARG_DEVICE_ALL): Likewise.
(GOMP_TARGET_ARG_SUBSEQUENT_PARAM): Likewise.
(GOMP_TARGET_ARG_ID_MASK): Likewise.
(GOMP_TARGET_ARG_NUM_TEAMS): Likewise.
(GOMP_TARGET_ARG_THREAD_LIMIT): Likewise.
(GOMP_TARGET_ARG_VALUE_SHIFT): Likewise.
(GOMP_TARGET_ARG_HSA_KERNEL_ATTRIBUTES): Likewise.

libgomp/
* libgomp-plugin.h (offload_target_type): New element
OFFLOAD_TARGET_TYPE_HSA.
* libgomp.h (gomp_target_task): New fields firstprivate_copies and
args.
(bool gomp_create_target_task): Updated.
(gomp_device_descr): Extra parameter of run_func and async_run_func,
new field can_run_func.
* libgomp_g.h (GOMP_target_ext): Update prototype.
* oacc-host.c (host_run): Added a new parameter args.
* target.c (calculate_firstprivate_requirements): New function.
(copy_firstprivate_data): Likewise.
(gomp_target_fallback_firstprivate): Use them.
(gomp_target_unshare_firstprivate): New function.
(gomp_get_target_fn_addr): Allow returning NULL for shared memory
devices.
(GOMP_target): Do host fallback for all shared memory devices.  Do not
pass any args to plugins.
(GOMP_target_ext): Introduce device-specific argument parameter args.
Allow host fallback if device shares memory.  Do not remap data if
device has shared memory.
(gomp_target_task_fn): Likewise.  Also treat shared memory devices
like host fallback for mappings.
(GOMP_target_data): Treat shared memory devices like host fallback.
(GOMP_target_data_ext): Likewise.
(GOMP_target_update): Likewise.
(GOMP_target_update_ext): Likewise.  Also pass NULL as args to
gomp_create_target_task.
(GOMP_target_enter_exit_data): Likewise.
(omp_target_alloc): Treat shared memory devices like host fallback.
(omp_target_free): Likewise.
(omp_target_is_present): Likewise.
(omp_target_memcpy): Likewise.
(omp_target_memcpy_rect): Likewise.
(omp_target_associate_ptr): Likewise.
(gomp_load_plugin_for_device): Also load can_run.
* task.c (GOMP_PLUGIN_target_task_completion): Free
firstprivate_copies.
(gomp_create_target_task): Accept new argument args and store it to
ttask.

liboffloadmic/plugin
* libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_async_run): New unused
parameter.
(GOMP_OFFLOAD_run): Likewise.

diff --git a/include/gomp-constants.h b/include/gomp-constants.h
index dffd631..a8e7723 100644
--- a/include/gomp-constants.h
+++ b/include/gomp-constants.h
@@ -176,6 +176,7 @@ enum gomp_map_kind
 #define GOMP_DEVICE_NOT_HOST   4
 #define GOMP_DEVICE_NVIDIA_PTX 5
 #define GOMP_DEVICE_INTEL_MIC  6
+#define GOMP_DEVICE_HSA7
 
 #define GOMP_DEVICE_ICV-1
 #define GOMP_DEVICE_HOST_FALLBACK  -2
@@ -201,6 +202,7 @@ enum gomp_map_kind
 #define GOMP_VERSION   0
 #define GOMP_VERSION_NVIDIA_PTX 1
 #define GOMP_VERSION_INTEL_MIC 0
+#define GOMP_VERSION_HSA 0
 
 #define GOMP_VERSION_PACK(LIB, DEV) (((LIB) << 16) | (DEV))
 #define GOMP_VERSION_LIB(PACK) (((PACK) >> 16) & 0x)
@@ -228,4 +230,30 @@ enum gomp_map_kind
 #define GOMP_LAUNCH_OP(X) (((X) >> GOMP_LAUNCH_OP_SHIFT) & 0x)
 #define GOMP_LAUNCH_OP_MAX 0x
 
+/* Bitmask to apply in order to find out the intended device of a target
+   argument.  */
+#define GOMP_TARGET_ARG_DEVICE_MASK((1 << 7) - 1)
+/* The target argument is significant for all devices.  */
+#define GOMP_TARGET_ARG_DEVICE_ALL 0
+
+/* Flag set when the subsequent element in the device-specific argument
+   values.  */
+#define GOMP_TARGET_ARG_SUBSEQUENT_PARAM   (1 << 7)
+
+/* Bitmask to apply to a target argument to find out the value identifier.  */
+#define GOMP_TARGET_ARG_ID_MASK(((1 << 8) - 1) << 8)
+/* Target argument index of NUM_TEAMS.  */
+#define GOMP_TARGET_ARG_NUM_TEAMS  (1 << 8)
+/* Target argument index of THREAD_LIMIT.  */
+#define GOMP_TARGET_ARG_THREAD_LIMIT   (2 << 8)
+
+/* If the value is directly embeded in target argument, it should be a 

[hsa merge 07/10] IPA-HSA pass

2016-01-13 Thread Martin Jambor
Hi,

this patch contains IPA-related changes that we need to bring about
for HSA.

The patch is a re-post of
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00720.html but so far we
have not received any feedback.  Let me quote the original
accompanying email here for reference:

When a target construct is gridified, the HSA GPU function is
associated with the CPU function throughout the compilation, so that
they can be registered as a pair in libgomp.

Ungridified target constructs and, more importantly, "pragma omp
declare target" marked functions emerge out of OMP expansion as one
gimple function for both the host and the accelerator. However, at
some point we need to create a special HSA function representation so
that we can modify behavior of a (very) few optimization passes for
them.

Both is done by the following new IPA pass, which creates new HSA
clones in these cases.  Moreover, it redirects the appropriate call
graph edges to be in between HSA implementations, marks HSA clones
with the flatten attribute to minimize any call overhead (which is
much more significant on GPUs) and makes sure both the CPU and GPU
functions are coupled together and remain in the same LTO partition so
that they can b registered together to libgomp.

Thanks,

Martin


2016-01-13  Martin Liska  
        Martin Jambor  

* ipa-hsa.c: New file.
* lto-section-in.c (lto_section_name): Add hsa section name.
* lto-streamer.h (lto_section_type): Add hsa section.
* lto-partition.c: Include "hsa.h"
(add_symbol_to_partition_1): Put hsa implementations into the
same partition as host implementations.
* timevar.def (TV_IPA_HSA): New.

diff --git a/gcc/ipa-hsa.c b/gcc/ipa-hsa.c
new file mode 100644
index 000..dd47995
--- /dev/null
+++ b/gcc/ipa-hsa.c
@@ -0,0 +1,329 @@
+/* Callgraph based analysis of static variables.
+   Copyright (C) 2015-2016 Free Software Foundation, Inc.
+   Contributed by Martin Liska 
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* Interprocedural HSA pass is responsible for creation of HSA clones.
+   For all these HSA clones, we emit HSAIL instructions and pass processing
+   is terminated.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "is-a.h"
+#include "hash-set.h"
+#include "vec.h"
+#include "tree.h"
+#include "tree-pass.h"
+#include "function.h"
+#include "basic-block.h"
+#include "gimple.h"
+#include "dumpfile.h"
+#include "gimple-pretty-print.h"
+#include "tree-streamer.h"
+#include "stringpool.h"
+#include "cgraph.h"
+#include "print-tree.h"
+#include "symbol-summary.h"
+#include "hsa.h"
+
+namespace {
+
+/* If NODE is not versionable, warn about not emiting HSAIL and return false.
+   Otherwise return true.  */
+
+static bool
+check_warn_node_versionable (cgraph_node *node)
+{
+  if (!node->local.versionable)
+{
+  warning_at (EXPR_LOCATION (node->decl), OPT_Whsa,
+ "could not emit HSAIL for function %s: function cannot be "
+ "cloned", node->name ());
+  return false;
+}
+  return true;
+}
+
+/* The function creates HSA clones for all functions that were either
+   marked as HSA kernels or are callable HSA functions.  Apart from that,
+   we redirect all edges that come from an HSA clone and end in another
+   HSA clone to connect these two functions.  */
+
+static unsigned int
+process_hsa_functions (void)
+{
+  struct cgraph_node *node;
+
+  if (hsa_summaries == NULL)
+hsa_summaries = new hsa_summary_t (symtab);
+
+  FOR_EACH_DEFINED_FUNCTION (node)
+{
+  hsa_function_summary *s = hsa_summaries->get (node);
+
+  /* A linked function is skipped.  */
+  if (s->m_binded_function != NULL)
+   continue;
+
+  if (s->m_kind != HSA_NONE)
+   {
+ if (!check_warn_node_versionable (node))
+   continue;
+ cgraph_node *clone = node->create_virtual_clone
+   (vec  (), NULL, NULL, "hsa");
+ TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl);
+
+ clone->force_output = true;
+ hsa_summari

[hsa merge 04/10] Avoid extraneous remapping in copy_gimple_seq_and_replace_locals

2016-01-13 Thread Martin Jambor
Hi,

this patch is new, it addresses a problem I outlined in
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00424.html and it is an
implementation of Jakub's suggestion in
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00614.html

I have refrained from bigger changes in struct copy_body_data in
tree-inline.h as I think that such a cleanup should be done
separately, but the structure could probably use some field-re
ordering to remove padding.

I hope I have grasped it correctly and that the patch is OK for trunk.

Thanks,

Martin


2016-01-13  Martin Jambor  

* tree-inline.c (remap_decl): Use existing dclarations if
remapping a type and prevent_decl_creation_for_types.
(replace_locals_stmt): Do an initial remapping of non-VLA typed
decls first.  Do real remapping with
prevent_decl_creation_for_types set.
* tree-inline.h (copy_body_data): New field
prevent_decl_creation_for_types, moved remap_var_for_cilk to avoid
padding.

diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 6bf2467..7b34288 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -340,8 +340,22 @@ remap_decl (tree decl, copy_body_data *id)
   return decl;
 }
 
-  /* If we didn't already have an equivalent for this declaration,
- create one now.  */
+  /* When remapping a type within copy_gimple_seq_and_replace_locals, all
+ necessary DECLs have already been remapped and we do not want to duplicate
+ a decl coming from outside of the sequence we are copying.  */
+  if (!n
+  && id->prevent_decl_creation_for_types
+  && id->remapping_type_depth > 0
+  && (VAR_P (decl) || TREE_CODE (decl) == PARM_DECL))
+{
+  if (id->do_not_unshare)
+   return decl;
+  else
+   return unshare_expr (decl);
+}
+
+  /* If we didn't already have an equivalent for this declaration, create one
+ now.  */
   if (!n)
 {
   /* Make a copy of the variable or label.  */
@@ -5225,8 +5239,19 @@ replace_locals_stmt (gimple_stmt_iterator *gsip,
   /* This will remap a lot of the same decls again, but this should be
 harmless.  */
   if (gimple_bind_vars (stmt))
-   gimple_bind_set_vars (stmt, remap_decls (gimple_bind_vars (stmt),
-NULL, id));
+   {
+ tree old_var, decls = gimple_bind_vars (stmt);
+
+ for (old_var = decls; old_var; old_var = DECL_CHAIN (old_var))
+   if (!can_be_nonlocal (old_var, id)
+   && ! variably_modified_type_p (TREE_TYPE (old_var), id->src_fn))
+ remap_decl (old_var, id);
+
+ gcc_checking_assert (!id->prevent_decl_creation_for_types);
+ id->prevent_decl_creation_for_types = true;
+ gimple_bind_set_vars (stmt, remap_decls (decls, NULL, id));
+ id->prevent_decl_creation_for_types = false;
+   }
 }
 
   /* Keep iterating.  */
diff --git a/gcc/tree-inline.h b/gcc/tree-inline.h
index d3e5229..4cc1f19 100644
--- a/gcc/tree-inline.h
+++ b/gcc/tree-inline.h
@@ -140,14 +140,17 @@ struct copy_body_data
  the originals have been mapped to a value rather than to a
  variable.  */
   hash_map *debug_map;
- 
-  /* Cilk keywords currently need to replace some variables that
- ordinary nested functions do not.  */ 
-  bool remap_var_for_cilk;
 
   /* A map from the inlined functions dependence info cliques to
  equivalents in the function into which it is being inlined.  */
   hash_map *dependence_map;
+
+  /* Cilk keywords currently need to replace some variables that
+ ordinary nested functions do not.  */
+  bool remap_var_for_cilk;
+
+  /* Do not create new declarations when within type remapping.  */
+  bool prevent_decl_creation_for_types;
 };
 
 /* Weights of constructions for estimate_num_insns.  */



[hsa merge 03/10] HSA libgomp plugin

2016-01-13 Thread Martin Jambor
Hi,

the patch below adds the HSA-specific plugin for libgomp.  The plugin
implements the interface mandated by libgomp and takes care of finding
any available HSA devices, finalizing HSAIL code and running it on
HSA-capable GPUs.

This patch is a re-post of
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00716.html with a number
of modifications requested by Jakub.

Thanks,

Martin


2016-01-13  Martin Jambor  
Martin Liska  

* plugin/plugin-hsa.c: New file.

diff --git a/libgomp/plugin/plugin-hsa.c b/libgomp/plugin/plugin-hsa.c
new file mode 100644
index 000..d888493
--- /dev/null
+++ b/libgomp/plugin/plugin-hsa.c
@@ -0,0 +1,1493 @@
+/* Plugin for HSAIL execution.
+
+   Copyright (C) 2013-2016 Free Software Foundation, Inc.
+
+   Contributed by Martin Jambor  and
+   Martin Liska .
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "libgomp-plugin.h"
+#include "gomp-constants.h"
+
+/* Keep the following GOMP prefixed structures in sync with respective parts of
+   the compiler.  */
+
+/* Structure describing the run-time and grid properties of an HSA kernel
+   lauch.  */
+
+struct GOMP_kernel_launch_attributes
+{
+  /* Number of dimensions the workload has.  Maximum number is 3.  */
+  uint32_t ndim;
+  /* Size of the grid in the three respective dimensions.  */
+  uint32_t gdims[3];
+  /* Size of work-groups in the respective dimensions.  */
+  uint32_t wdims[3];
+};
+
+/* Collection of information needed for a dispatch of a kernel from a
+   kernel.  */
+
+struct GOMP_hsa_kernel_dispatch
+{
+  /* Pointer to a command queue associated with a kernel dispatch agent.  */
+  void *queue;
+  /* Pointer to reserved memory for OMP data struct copying.  */
+  void *omp_data_memory;
+  /* Pointer to a memory space used for kernel arguments passing.  */
+  void *kernarg_address;
+  /* Kernel object.  */
+  uint64_t object;
+  /* Synchronization signal used for dispatch synchronization.  */
+  uint64_t signal;
+  /* Private segment size.  */
+  uint32_t private_segment_size;
+  /* Group segment size.  */
+  uint32_t group_segment_size;
+  /* Number of children kernel dispatches.  */
+  uint64_t kernel_dispatch_count;
+  /* Debug purpose argument.  */
+  uint64_t debug;
+  /* Levels-var ICV.  */
+  uint64_t omp_level;
+  /* Kernel dispatch structures created for children kernel dispatches.  */
+  struct GOMP_hsa_kernel_dispatch **children_dispatches;
+  /* Number of threads.  */
+  uint32_t omp_num_threads;
+};
+
+/* Part of the libgomp plugin interface.  Return the name of the accelerator,
+   which is "hsa".  */
+
+const char *
+GOMP_OFFLOAD_get_name (void)
+{
+  return "hsa";
+}
+
+/* Part of the libgomp plugin interface.  Return the specific capabilities the
+   HSA accelerator have.  */
+
+unsigned int
+GOMP_OFFLOAD_get_caps (void)
+{
+  return GOMP_OFFLOAD_CAP_SHARED_MEM | GOMP_OFFLOAD_CAP_OPENMP_400;
+}
+
+/* Part of the libgomp plugin interface.  Identify as HSA accelerator.  */
+
+int
+GOMP_OFFLOAD_get_type (void)
+{
+  return OFFLOAD_TARGET_TYPE_HSA;
+}
+
+/* Return the libgomp version number we're compatible with.  There is
+   no requirement for cross-version compatibility.  */
+
+unsigned
+GOMP_OFFLOAD_version (void)
+{
+  return GOMP_VERSION;
+}
+
+/* Flag to decide whether print to stderr information about what is going on.
+   Set in init_debug depending on environment variables.  */
+
+static bool debug;
+
+/* Flag to decide if the runtime should suppress a possible fallback to host
+   execution.  */
+
+static bool suppress_host_fallback;
+
+/* Initialize debug and suppress_host_fallback according to the environment.  
*/
+
+static void
+init_enviroment_variables (void)
+{
+  if (getenv ("HSA_DEBUG"))
+debug = true;
+  else
+debug = false;
+
+  if (getenv ("HSA_SUPPRESS_HOST_FALLBACK"))
+suppress_host_fallback = true;
+  else
+suppress

[hsa merge 10/10] HSA register allocator

2016-01-13 Thread Martin Jambor
Hi,

because HSA backend is not based on RTL,we need our own, and it is in
this patch.  The allocator has been written by Michael Matz and I have
put it into a separate email so that I can add him to CC, because he
is much better suited to answer any questions or review comments.

Thanks,

Martin


2016-01-13  Michael Matz 
Martin Jambor  

* hsa-regalloc.c: New file.

diff --git a/gcc/hsa-regalloc.c b/gcc/hsa-regalloc.c
new file mode 100644
index 000..5a42beb
--- /dev/null
+++ b/gcc/hsa-regalloc.c
@@ -0,0 +1,719 @@
+/* HSAIL IL Register allocation and out-of-SSA.
+   Copyright (C) 2013-2016 Free Software Foundation, Inc.
+   Contributed by Michael Matz 
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "is-a.h"
+#include "vec.h"
+#include "tree.h"
+#include "dominance.h"
+#include "cfg.h"
+#include "cfganal.h"
+#include "function.h"
+#include "bitmap.h"
+#include "dumpfile.h"
+#include "cgraph.h"
+#include "print-tree.h"
+#include "cfghooks.h"
+#include "symbol-summary.h"
+#include "hsa.h"
+
+
+/* Process a PHI node PHI of basic block BB as a part of naive out-f-ssa.  */
+
+static void
+naive_process_phi (hsa_insn_phi *phi)
+{
+  unsigned count = phi->operand_count ();
+  for (unsigned i = 0; i < count; i++)
+{
+  gcc_checking_assert (phi->get_op (i));
+  hsa_op_base *op = phi->get_op (i);
+  hsa_bb *hbb;
+  edge e;
+
+  if (!op)
+   break;
+
+  e = EDGE_PRED (phi->m_bb, i);
+  if (single_succ_p (e->src))
+   hbb = hsa_bb_for_bb (e->src);
+  else
+   {
+ basic_block old_dest = e->dest;
+ hbb = hsa_init_new_bb (split_edge (e));
+
+ /* If switch insn used this edge, fix jump table.  */
+ hsa_bb *source = hsa_bb_for_bb (e->src);
+ hsa_insn_sbr *sbr;
+ if (source->m_last_insn
+ && (sbr = dyn_cast  (source->m_last_insn)))
+   sbr->replace_all_labels (old_dest, hbb->m_bb);
+   }
+
+  hsa_build_append_simple_mov (phi->m_dest, op, hbb);
+}
+}
+
+/* Naive out-of SSA.  */
+
+static void
+naive_outof_ssa (void)
+{
+  basic_block bb;
+
+  hsa_cfun->m_in_ssa = false;
+
+  FOR_ALL_BB_FN (bb, cfun)
+  {
+hsa_bb *hbb = hsa_bb_for_bb (bb);
+hsa_insn_phi *phi;
+
+for (phi = hbb->m_first_phi;
+phi;
+phi = phi->m_next ? as_a  (phi->m_next): NULL)
+  naive_process_phi (phi);
+
+/* Zap PHI nodes, they will be deallocated when everything else will.  */
+hbb->m_first_phi = NULL;
+hbb->m_last_phi = NULL;
+  }
+}
+
+/* Return register class number for the given HSA TYPE.  0 means the 'c' one
+   bit register class, 1 means 's' 32 bit class, 2 stands for 'd' 64 bit class
+   and 3 for 'q' 128 bit class.  */
+
+static int
+m_reg_class_for_type (BrigType16_t type)
+{
+  switch (type)
+{
+case BRIG_TYPE_B1:
+  return 0;
+
+case BRIG_TYPE_U8:
+case BRIG_TYPE_U16:
+case BRIG_TYPE_U32:
+case BRIG_TYPE_S8:
+case BRIG_TYPE_S16:
+case BRIG_TYPE_S32:
+case BRIG_TYPE_F16:
+case BRIG_TYPE_F32:
+case BRIG_TYPE_B8:
+case BRIG_TYPE_B16:
+case BRIG_TYPE_B32:
+case BRIG_TYPE_U8X4:
+case BRIG_TYPE_S8X4:
+case BRIG_TYPE_U16X2:
+case BRIG_TYPE_S16X2:
+case BRIG_TYPE_F16X2:
+  return 1;
+
+case BRIG_TYPE_U64:
+case BRIG_TYPE_S64:
+case BRIG_TYPE_F64:
+case BRIG_TYPE_B64:
+case BRIG_TYPE_U8X8:
+case BRIG_TYPE_S8X8:
+case BRIG_TYPE_U16X4:
+case BRIG_TYPE_S16X4:
+case BRIG_TYPE_F16X4:
+case BRIG_TYPE_U32X2:
+case BRIG_TYPE_S32X2:
+case BRIG_TYPE_F32X2:
+  return 2;
+
+case BRIG_TYPE_B128:
+case BRIG_TYPE_U8X16:
+case BRIG_TYPE_S8X16:
+case BRIG_TYPE_U16X8:
+case BRIG_TYPE_S16X8:
+case BRIG_TYPE_F16X8:
+case BRIG_TYPE_U32X4:
+case BRIG_TYPE_U64X2:
+case BRIG_TYPE_S32X4:
+case BRIG_TYPE_S64X2:
+case BRIG_TYPE_F32X4:
+case BRIG_TYPE_F64X2:
+  return 3;
+
+default:
+  gcc_unreachable ();
+}
+}
+
+/* If the Ith

[hsa merge 08/10] HSAIL BRIG description header file

2016-01-13 Thread Martin Jambor
Hi,

the following patch adds a BRIG (binary representation of HSAIL)
representation description.  It is within a single header file
describing the binary structures and constants of the format.

The file comes from the HSA Foundation (I have only added the
HSA_BRIG_FORMAT_H macro and check and removed some weird comments
which are not present in proposed future versions of the file) and is
licensed under "University of Illinois/NCSA Open Source License."

The license is "GPL-compatible" according to FSF
(http://www.gnu.org/licenses/license-list.en.html#GPLCompatibleLicenses)
so I believe we can have it in GCC.  Nevertheless, it is not GPL and
there is no copyright assignment for it, but the situation is
hopefully analogous to some other libraries that have their upstream
elsewhere but we ship them as part of the GCC.

In the previous posting of this patch
(https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00721.html) I have
requested a permission from the steering committee to include this file
with a different upstream in GCC.  I have not received an official
reply but since I have been chosen to be the HSA maintainer, I tend to
think there were no legal objections against HSA going forward,
including this file.

Thanks,

Martin


2015-12-04  Martin Jambor  

* hsa-brig-format.h: New file.

diff --git a/gcc/hsa-brig-format.h b/gcc/hsa-brig-format.h
new file mode 100644
index 000..6e2fe75
--- /dev/null
+++ b/gcc/hsa-brig-format.h
@@ -0,0 +1,1277 @@
+// University of Illinois/NCSA
+// Open Source License
+//
+// Copyright (c) 2013-2015, Advanced Micro Devices, Inc.
+// All rights reserved.
+//
+// Developed by:
+//
+// HSA Team
+//
+// Advanced Micro Devices, Inc
+//
+// www.amd.com
+//
+// Permission is hereby granted, free of charge, to any person obtaining a 
copy of
+// this software and associated documentation files (the "Software"), to deal 
with
+// the Software without restriction, including without limitation the rights to
+// use, copy, modify, merge, publish, distribute, sublicense, and/or sell 
copies
+// of the Software, and to permit persons to whom the Software is furnished to 
do
+// so, subject to the following conditions:
+//
+// * Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimers.
+//
+// * Redistributions in binary form must reproduce the above copyright 
notice,
+//   this list of conditions and the following disclaimers in the
+//   documentation and/or other materials provided with the distribution.
+//
+// * Neither the names of the HSA Team, University of Illinois at
+//   Urbana-Champaign, nor the names of its contributors may be used to
+//   endorse or promote products derived from this Software without 
specific
+//   prior written permission.
+//
+// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 
FITNESS
+// FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
+// CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH 
THE
+// SOFTWARE.
+
+#ifndef HSA_BRIG_FORMAT_H
+#define HSA_BRIG_FORMAT_H
+
+typedef uint32_t BrigVersion32_t;
+
+enum BrigVersion {
+
+BRIG_VERSION_HSAIL_MAJOR = 1,
+BRIG_VERSION_HSAIL_MINOR = 0,
+BRIG_VERSION_BRIG_MAJOR  = 1,
+BRIG_VERSION_BRIG_MINOR  = 0
+};
+
+typedef uint8_t BrigAlignment8_t;
+
+typedef uint8_t BrigAllocation8_t;
+
+typedef uint8_t BrigAluModifier8_t;
+
+typedef uint8_t BrigAtomicOperation8_t;
+
+typedef uint32_t BrigCodeOffset32_t;
+
+typedef uint8_t BrigCompareOperation8_t;
+
+typedef uint16_t BrigControlDirective16_t;
+
+typedef uint32_t BrigDataOffset32_t;
+
+typedef BrigDataOffset32_t BrigDataOffsetCodeList32_t;
+
+typedef BrigDataOffset32_t BrigDataOffsetOperandList32_t;
+
+typedef BrigDataOffset32_t BrigDataOffsetString32_t;
+
+typedef uint8_t BrigExecutableModifier8_t;
+
+typedef uint8_t BrigImageChannelOrder8_t;
+
+typedef uint8_t BrigImageChannelType8_t;
+
+typedef uint8_t BrigImageGeometry8_t;
+
+typedef uint8_t BrigImageQuery8_t;
+
+typedef uint16_t BrigKind16_t;
+
+typedef uint8_t BrigLinkage8_t;
+
+typedef uint8_t BrigMachineModel8_t;
+
+typedef uint8_t BrigMemoryModifier8_t;
+
+typedef uint8_t BrigMemoryOrder8_t;
+
+typedef uint8_t BrigMemoryScope8_t;
+
+typedef uint16_t BrigOpcode16_t;
+
+typedef uint32_t BrigOperandOffset32_t;
+
+typedef uint8_t BrigPack8_t;
+
+typedef uint8_t BrigProfile8_t;
+
+typedef uint16_t BrigRegisterKind16_t;
+
+typedef uint8_t BrigRound8_t;
+
+typedef uint8_t BrigSamplerAddressing8_t;
+
+typedef uint8_t BrigSamplerCoordNormalization8_t;
+
+typedef uint8_t BrigSamplerFilter8_t;
+
+typedef uint8_t 

[hsa merge 05/10] OpenMP lowering/expansion changes (gridification)

2016-01-13 Thread Martin Jambor
Hi,

the patch in this email contains the changes to make our OpenMP
lowering and expansion machinery produce GPU kernels for a certain
limited class of loops.

The following is a re-post of
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00718.html with a fair
amount of incorporate feedback, almost all of which has been posted in
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01884.html.

Thanks,

Martin


2016-01-13  Martin Jambor  

gcc/
* builtin-types.def (BT_FN_VOID_UINT_PTR_INT_PTR): New.
(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT): Removed.
(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR): New.
* gimple-low.c (lower_stmt): Also handle GIMPLE_OMP_GRID_BODY.
* gimple-pretty-print.c (dump_gimple_omp_for): Also handle
GF_OMP_FOR_KIND_GRID_LOOP.
(dump_gimple_omp_block): Also handle GIMPLE_OMP_GRID_BODY.
(pp_gimple_stmt_1): Likewise.
* gimple-walk.c (walk_gimple_stmt): Likewise.
* gimple.c (gimple_build_omp_grid_body): New function.
(gimple_copy): Also handle GIMPLE_OMP_GRID_BODY.
* gimple.def (GIMPLE_OMP_GRID_BODY): New.
* gimple.h (enum gf_mask): Added GF_OMP_PARALLEL_GRID_PHONY,
GF_OMP_FOR_KIND_GRID_LOOP, GF_OMP_FOR_GRID_PHONY and
GF_OMP_TEAMS_GRID_PHONY.
(gimple_statement_omp_single_layout): Updated comments.
(gimple_build_omp_grid_body): New function.
(gimple_has_substatements): Also handle GIMPLE_OMP_GRID_BODY.
(gimple_omp_for_grid_phony): New function.
(gimple_omp_for_set_grid_phony): Likewise.
(gimple_omp_parallel_grid_phony): Likewise.
(gimple_omp_parallel_set_grid_phony): Likewise.
(gimple_omp_teams_grid_phony): Likewise.
(gimple_omp_teams_set_grid_phony): Likewise.
(gimple_return_set_retbnd): Also handle GIMPLE_OMP_GRID_BODY.
* omp-builtins.def (BUILT_IN_GOMP_OFFLOAD_REGISTER): New.
(BUILT_IN_GOMP_OFFLOAD_UNREGISTER): Likewise.
(BUILT_IN_GOMP_TARGET): Updated type.
* omp-low.c: Include symbol-summary.h, hsa.h and params.h.
(adjust_for_condition): New function.
(get_omp_for_step_from_incr): Likewise.
(extract_omp_for_data): Moved parts to adjust_for_condition and
get_omp_for_step_from_incr.
(build_outer_var_ref): Handle GIMPLE_OMP_GRID_BODY.
(fixup_child_record_type): Bail out if receiver_decl is NULL.
(scan_sharing_clauses): Handle OMP_CLAUSE__GRIDDIM_.
(scan_omp_parallel): Do not create child functions for phony
constructs.
(check_omp_nesting_restrictions): Handle GIMPLE_OMP_GRID_BODY.
(scan_omp_1_op): Checking assert we are not remapping to
ERROR_MARK.  Also also handle GIMPLE_OMP_GRID_BODY.
(parallel_needs_hsa_kernel_p): New function.
(expand_parallel_call): Register apprpriate parallel child
functions as HSA kernels.
(grid_launch_attributes_trees): New type.
(grid_attr_trees): New variable.
(grid_create_kernel_launch_attr_types): New function.
(grid_insert_store_range_dim): Likewise.
(grid_get_kernel_launch_attributes): Likewise.
(get_target_argument_identifier_1): Likewise.
(get_target_argument_identifier): Likewise.
(get_target_argument_value): Likewise.
(push_target_argument_according_to_value): Likewise.
(get_target_arguments): Likewise.
(expand_omp_target): Call get_target_arguments instead of looking
up for teams and thread limit.
(grid_expand_omp_for_loop): New function.
(grid_arg_decl_map): New type.
(grid_remap_kernel_arg_accesses): New function.
(grid_expand_target_kernel_body): New function.
(expand_omp): Call it.
(lower_omp_for): Do not emit phony constructs.
(lower_omp_taskreg): Do not emit phony constructs but create for them
a temporary variable receiver_decl.
(lower_omp_taskreg): Do not emit phony constructs.
(lower_omp_teams): Likewise.
(lower_omp_grid_body): New function.
(lower_omp_1): Call it.
(grid_reg_assignment_to_local_var_p): New function.
(grid_seq_only_contains_local_assignments): Likewise.
(grid_find_single_omp_among_assignments_1): Likewise.
(grid_find_single_omp_among_assignments): Likewise.
(grid_find_ungridifiable_statement): Likewise.
(grid_target_follows_gridifiable_pattern): Likewise.
(grid_remap_prebody_decls): Likewise.
(grid_copy_leading_local_assignments): Likewise.
(grid_process_kernel_body_copy): Likewise.
(grid_attempt_target_gridification): Likewise.
(grid_gridify_all_targets_stmt): Likewise.
(grid_gridify_all_targets): Likewise.
(execute_lower_omp): Call grid_gridify_all_targets.
(make_gimple_omp_edges): Handle GIMPLE_OMP_GRID_BODY.
* tree-core.h (omp_clause_code): Added

Re: [hsa merge 08/10] HSAIL BRIG description header file

2016-01-15 Thread Martin Jambor
Hi,

On Thu, Jan 14, 2016 at 05:18:56PM -0800, Ian Lance Taylor wrote:
> Jakub Jelinek  writes:
> 
> > On Wed, Jan 13, 2016 at 06:39:33PM +0100, Martin Jambor wrote:
> >> the following patch adds a BRIG (binary representation of HSAIL)
> >> representation description.  It is within a single header file
> >> describing the binary structures and constants of the format.
> >> 
> >> The file comes from the HSA Foundation (I have only added the
> >> HSA_BRIG_FORMAT_H macro and check and removed some weird comments
> >> which are not present in proposed future versions of the file) and is
> >> licensed under "University of Illinois/NCSA Open Source License."
> >> 
> >> The license is "GPL-compatible" according to FSF
> >> (http://www.gnu.org/licenses/license-list.en.html#GPLCompatibleLicenses)
> >> so I believe we can have it in GCC.  Nevertheless, it is not GPL and
> >> there is no copyright assignment for it, but the situation is
> >> hopefully analogous to some other libraries that have their upstream
> >> elsewhere but we ship them as part of the GCC.
> >> 
> >> In the previous posting of this patch
> >> (https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00721.html) I have
> >> requested a permission from the steering committee to include this file
> >> with a different upstream in GCC.  I have not received an official
> >> reply but since I have been chosen to be the HSA maintainer, I tend to
> >> think there were no legal objections against HSA going forward,
> >> including this file.
> 
> Martin, could you ask the HSA Foundation or AMD or whoever if there is
> any way they could remove the second requirement of the license?  It
> adds yet another case where anybody distributing GCC has to list yet
> another copyright notice.

I will raise this with the HSA PRM group and perhaps there is a slight
chance that they will change this in the upcoming version of HSAIL.
But it is not going to happen soon enough.

> 
> Barring that, I would personally prefer that you write your own version
> of this header file, defining the constants and structs that you need.
> That's basically what we've done for ELF and COFF and Mach-O, several
> times over.  For example, libiberty/simple-object-elf.c.

Well, if we have done something like this before, I can go through the
exercise of copy'n'pasting everything from the PDF specification, if
that allowed us to "own" the file and put it under GPL 3.  But I must
say I do not know.

It is going to be a bit tedious job (and it would be good to double
check I made no mistakes somehow) but it is certainly doable.  I guess
I will embark on it after going through the rest of the review (unless
someone here tells me I should not, that is).

> 
> Barring that, I agree with Jakub that this looks like something that
> should go in the top-level include subdirectory rather than the gcc
> subdirectory.

Even if I "create" a copy of our own?  But sure, no problem.

Martin


Re: [hsa merge 05/10] OpenMP lowering/expansion changes (gridification)

2016-01-15 Thread Martin Jambor
Thanks Jakub and Alex,

I have committed the following to the branch to address your comments:

2016-01-15  Martin Jambor  

* gimple.h: Fixed comment of gimple_statement_omp_single_layout
* omp-low.c (get_target_argument_value): Fixed spelling in its
comment.
(push_target_argument_according_to_value): Likewise.
* tree.h (OMP_CLAUSE_GRIDDIM_DIMENSION): Renamed to
OMP_CLAUSE__GRIDDIM__DIMENSION
---
 gcc/gimple.h|  2 +-
 gcc/omp-low.c   | 12 ++--
 gcc/tree-pretty-print.c |  2 +-
 gcc/tree.h  |  5 +
 4 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/gcc/gimple.h b/gcc/gimple.h
index 7eef07c..6d15dab 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -730,7 +730,7 @@ struct GTY((tag("GSS_OMP_CONTINUE")))
   tree control_use;
 };
 
-/* GIMPLE_OMP_SINGLE, GIMPLE_OMP_ORDERED */
+/* GIMPLE_OMP_SINGLE, GIMPLE_OMP_TEAMS, GIMPLE_OMP_ORDERED */
 
 struct GTY((tag("GSS_OMP_SINGLE_LAYOUT")))
   gimple_statement_omp_single_layout : public gimple_statement_omp
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index c534f5c..616c5bd 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -12741,7 +12741,7 @@ grid_get_kernel_launch_attributes (gimple_stmt_iterator 
*gsi,
   if (OMP_CLAUSE_CODE (clause) != OMP_CLAUSE__GRIDDIM_)
continue;
 
-  unsigned dim = OMP_CLAUSE_GRIDDIM_DIMENSION (clause);
+  unsigned dim = OMP_CLAUSE__GRIDDIM__DIMENSION (clause);
   max_dim = MAX (dim, max_dim);
 
   grid_insert_store_range_dim (gsi, lattrs,
@@ -12788,7 +12788,7 @@ get_target_argument_identifier (int device, bool 
subseqent_param, int id)
   return fold_convert (ptr_type_node, t);
 }
 
-/* Return a target argument consisiting of DEVICE identifier, value identifier
+/* Return a target argument consisting of DEVICE identifier, value identifier
ID, and the actual VALUE.  */
 
 static tree
@@ -12806,8 +12806,8 @@ get_target_argument_value (gimple_stmt_iterator *gsi, 
int device, int id,
 }
 
 /* If VALUE is an integer constant greater than -2^15 and smaller than 2^15,
-   push one argument to ARGS with bot the DEVICE, ID and VALUE embeded in it,
-   otherwise push an iedntifier (with DEVICE and ID) and the VALUE in two
+   push one argument to ARGS with both the DEVICE, ID and VALUE embedded in it,
+   otherwise push an identifier (with DEVICE and ID) and the VALUE in two
arguments.  */
 
 static void
@@ -17693,7 +17693,7 @@ grid_attempt_target_gridification (gomp_target *target,
ws = build_zero_cst (uint32_type_node);
 
   tree c = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE__GRIDDIM_);
-  OMP_CLAUSE_SET_GRIDDIM_DIMENSION (c, (unsigned int) i);
+  OMP_CLAUSE__GRIDDIM__DIMENSION (c) = i;
   OMP_CLAUSE__GRIDDIM__SIZE (c) = gs;
   OMP_CLAUSE__GRIDDIM__GROUP (c) = ws;
   OMP_CLAUSE_CHAIN (c) = gimple_omp_target_clauses (target);
@@ -17749,7 +17749,7 @@ grid_gridify_all_targets (gimple_seq *body_p)
   memset (&wi, 0, sizeof (wi));
   walk_gimple_seq_mod (body_p, grid_gridify_all_targets_stmt, NULL, &wi);
 }
-
+
 
 /* Main entry point.  */
 
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 31cea10..9c13d84 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -944,7 +944,7 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, 
int flags)
 
 case OMP_CLAUSE__GRIDDIM_:
   pp_string (pp, "_griddim_(");
-  pp_unsigned_wide_integer (pp, OMP_CLAUSE_GRIDDIM_DIMENSION (clause));
+  pp_unsigned_wide_integer (pp, OMP_CLAUSE__GRIDDIM__DIMENSION (clause));
   pp_colon (pp);
   dump_generic_node (pp, OMP_CLAUSE__GRIDDIM__SIZE (clause), spc, flags,
 false);
diff --git a/gcc/tree.h b/gcc/tree.h
index e885ea1..9b987bb 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1636,12 +1636,9 @@ extern void protected_set_expr_location (tree, 
location_t);
 #define OMP_CLAUSE_TILE_LIST(NODE) \
   OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_TILE), 0)
 
-#define OMP_CLAUSE_GRIDDIM_DIMENSION(NODE) \
+#define OMP_CLAUSE__GRIDDIM__DIMENSION(NODE) \
   (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__GRIDDIM_)\
->omp_clause.subcode.dimension)
-#define OMP_CLAUSE_SET_GRIDDIM_DIMENSION(NODE, DIMENSION) \
-  (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__GRIDDIM_)\
-   ->omp_clause.subcode.dimension = (DIMENSION))
 #define OMP_CLAUSE__GRIDDIM__SIZE(NODE) \
   OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__GRIDDIM_), 0)
 #define OMP_CLAUSE__GRIDDIM__GROUP(NODE) \
-- 
2.6.4



Re: [hsa merge 07/10] IPA-HSA pass

2016-01-15 Thread Martin Jambor
On Thu, Jan 14, 2016 at 01:58:58PM +0100, Jakub Jelinek wrote:
> Otherwise LGTM.
> 
>   Jakub

Thanks Jakub, I have committed the following patch from Martin Liska
that addresses your comments.

Martin

2016-01-15  Martin Liska  

* ipa-hsa.c (process_hsa_functions): Fixed coding style.
(ipa_hsa_read_section): Likewise.
(ipa_hsa_read_section): Likewise.
(pass_ipa_hsa::gate): Removed in_lto_p from the condition.
---
 gcc/ipa-hsa.c | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/gcc/ipa-hsa.c b/gcc/ipa-hsa.c
index dd47995..769657f 100644
--- a/gcc/ipa-hsa.c
+++ b/gcc/ipa-hsa.c
@@ -86,8 +86,9 @@ process_hsa_functions (void)
{
  if (!check_warn_node_versionable (node))
continue;
- cgraph_node *clone = node->create_virtual_clone
-   (vec  (), NULL, NULL, "hsa");
+ cgraph_node *clone
+   = node->create_virtual_clone (vec  (),
+ NULL, NULL, "hsa");
  TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl);
 
  clone->force_output = true;
@@ -102,8 +103,9 @@ process_hsa_functions (void)
{
  if (!check_warn_node_versionable (node))
continue;
- cgraph_node *clone = node->create_virtual_clone
-   (vec  (), NULL, NULL, "hsa");
+ cgraph_node *clone
+   = node->create_virtual_clone (vec  (),
+ NULL, NULL, "hsa");
  TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl);
 
  if (!cgraph_local_p (node))
@@ -209,8 +211,8 @@ static void
 ipa_hsa_read_section (struct lto_file_decl_data *file_data, const char *data,
   size_t len)
 {
-  const struct lto_function_header *header =
-(const struct lto_function_header *) data;
+  const struct lto_function_header *header
+= (const struct lto_function_header *) data;
   const int cfg_offset = sizeof (struct lto_function_header);
   const int main_offset = cfg_offset + header->cfg_size;
   const int string_offset = main_offset + header->main_size;
@@ -221,9 +223,9 @@ ipa_hsa_read_section (struct lto_file_decl_data *file_data, 
const char *data,
   lto_input_block ib_main ((const char *) data + main_offset,
   header->main_size, file_data->mode_table);
 
-  data_in =
-lto_data_in_create (file_data, (const char *) data + string_offset,
-   header->string_size, vNULL);
+  data_in
+= lto_data_in_create (file_data, (const char *) data + string_offset,
+ header->string_size, vNULL);
   count = streamer_read_uhwi (&ib_main);
 
   for (i = 0; i < count; i++)
@@ -317,7 +319,7 @@ public:
 bool
 pass_ipa_hsa::gate (function *)
 {
-  return hsa_gen_requested_p () || in_lto_p;
+  return hsa_gen_requested_p ();
 }
 
 } // anon namespace
-- 
2.6.4





Re: [hsa merge 09/10] Majority of the HSA back-end

2016-01-15 Thread Martin Jambor
Hi,

thanks Jakub.  Below you'll find a patch, which is mostly work of
Martin Liska, that should address all the review comments.  We have
then also went over the "XXX" marks (my bad that I forgot that Michael
uses this mark), removed half of them and turned the rest into TODOs.

Let me just quickly answer two comments as well:

On Thu, Jan 14, 2016 at 03:05:33PM +0100, Jakub Jelinek wrote:
> On Wed, Jan 13, 2016 at 06:39:34PM +0100, Martin Jambor wrote:
>
...
> > +#define HSA_WARN_MEMORY_ROUTINE "OpenMP device memory library routines 
> > have " \
> > +  "undefined semantics within target regions, support for HSA ignores them"
> 
> Well, if you don't support them in HSA target regions, you'd better punt and
> not error on them.

We don't error, apart from issuing a warning we basically ignore them.
I believe we can do it even in the long term and that it is in fact
useful because the standard says that the "effect" if these routines
is "unspecified" if they get called from a target region.

Perhaps this is even something we should warn about earlier in omp
lowering/expansion.

...

> > +unsigned
> > +hsa_internal_fn::get_arity ()
> > +{
> > +  switch (m_fn)
> > +{
> > +case IFN_ACOS:
> > +case IFN_ASIN:
> > +case IFN_ATAN:
> > +case IFN_COS:
> > +case IFN_EXP:
> > +case IFN_EXP10:
> > +case IFN_EXP2:
> > +case IFN_EXPM1:
> > +case IFN_LOG:
> > +case IFN_LOG10:
> > +case IFN_LOG1P:
> > +case IFN_LOG2:
> > +case IFN_LOGB:
> > +case IFN_SIGNIFICAND:
> > +case IFN_SIN:
> > +case IFN_SQRT:
> > +case IFN_TAN:
> > +case IFN_CEIL:
> > +case IFN_FLOOR:
> > +case IFN_NEARBYINT:
> > +case IFN_RINT:
> > +case IFN_ROUND:
> > +case IFN_TRUNC:
> > +  return 1;
> > +case IFN_ATAN2:
> > +case IFN_COPYSIGN:
> > +case IFN_FMOD:
> > +case IFN_POW:
> > +case IFN_REMAINDER:
> > +case IFN_SCALB:
> > +case IFN_LDEXP:
> > +  return 2;
> > +  break;
> > +case IFN_CLRSB:
> > +case IFN_CLZ:
> > +case IFN_CTZ:
> > +case IFN_FFS:
> > +case IFN_PARITY:
> > +case IFN_POPCOUNT:
> > +default:
> > +  gcc_unreachable ();
> 
> There are various other IFNs (e.g. for __builtin_{add,sub,mul}_overflow,
> lots of others).  How do you ensure you don't ICE on those?

Martin added a comment explaining this.  This can only be reached when
we already know we are processing a known builtin, filtered by
gen_hsa_insn_for_internal_fn_call.

Thanks for looking at the code,

Martin

2016-01-15  Martin Liska  
Martin Jambor  

* hsa-brig.c (struct function_linkage_pair): Fix GNU coding style
and replace sprintf with snprintf.
(hsa_brig_section::init): Likewise.
(hsa_brig_section::output): Likewise.
(hsa_brig_section::get_ptr_by_offset): Likewise.
(brig_string_slot_hasher::hash): Likewise.
(brig_string_slot_hasher::equal): Likewise.
(brig_string_slot_hasher::remove): Likewise.
(brig_emit_string): Likewise.
(brig_init): Likewise.
(emit_directive_variable): Likewise.
(emit_function_directives): Likewise.
(emit_bb_label_directive): Likewise.
(emit_immediate_scalar_to_buffer): Likewise.
(hsa_op_immed::emit_to_buffer): Likewise.
(emit_immediate_operand): Likewise.
(emit_address_operand): Likewise.
(emit_memory_insn): Likewise.
(emit_alloca_insn): Likewise.
(emit_cmp_insn): Likewise.
(emit_branch_insn): Likewise.
(emit_switch_insn): Likewise.
(emit_call_insn): Likewise.
(emit_arg_block_insn): Likewise.
(emit_packed_insn): Likewise.
(emit_basic_insn): Likewise.
(hsa_brig_emit_function): Likewise.
(hsa_output_global_variables): Likewise.
(hsa_output_kernels): Likewise.
(hsa_output_libgomp_mapping): Likewise.
(hsa_output_brig): Likewise.
* hsa-dump.c (dump_hsa_immed): Likewise.
(dump_hsa_insn_1): Likewise.
* hsa-gen.c (hsa_symbol::total_byte_size): Likewise.
(hsa_init_simple_builtins): Likewise.
(hsa_init_data_for_cfun): Likewise.
(hsa_type_for_scalar_tree_type): Likewise.
(get_symbol_for_decl): Likewise.
(hsa_get_host_function): Likewise.
(hsa_op_immed::hsa_op_immed): Likewise.
(hsa_insn_mem::hsa_insn_mem): Likewise.
(hsa_insn_atomic::hsa_insn_atomic): Likewise.
(hsa_insn_seg::hsa_insn_seg): Likewise.
(hsa_insn_srctype::hsa_insn_srctype)

Re: [hsa merge 10/10] HSA register allocator

2016-01-15 Thread Martin Jambor
Hi,

On Thu, Jan 14, 2016 at 03:41:34PM +0100, Jakub Jelinek wrote:
> On Wed, Jan 13, 2016 at 06:39:35PM +0100, Martin Jambor wrote:
> > +for (phi = hbb->m_first_phi;
> > +phi;
> > +phi = phi->m_next ? as_a  (phi->m_next): NULL)
> 
> Space before :
> 
> Ok with that change.
> 

I have committed the following patch from Martin to address this and a
few other code style issues.

Thanks,

Martin

2016-01-15  Martin Liska  

* hsa-regalloc.c (naive_outof_ssa): Fixed coding style.
(linear_scan_regalloc): Likewise.
(regalloc): Likewise.
---
 gcc/hsa-regalloc.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/hsa-regalloc.c b/gcc/hsa-regalloc.c
index 5a42beb..f8e83ecf 100644
--- a/gcc/hsa-regalloc.c
+++ b/gcc/hsa-regalloc.c
@@ -90,7 +90,7 @@ naive_outof_ssa (void)
 
 for (phi = hbb->m_first_phi;
 phi;
-phi = phi->m_next ? as_a  (phi->m_next): NULL)
+phi = phi->m_next ? as_a  (phi->m_next) : NULL)
   naive_process_phi (phi);
 
 /* Zap PHI nodes, they will be deallocated when everything else will.  */
@@ -525,7 +525,7 @@ linear_scan_regalloc (struct m_reg_class_desc *classes)
   else
after_end_number = insn_order;
   /* Everything live-out in this BB has at least an end point
- after us. */
+after us.  */
   EXECUTE_IF_SET_IN_BITMAP (hbb->m_liveout, 0, bit, bi)
note_lr_end (ind2reg[bit], after_end_number);
 
@@ -549,7 +549,7 @@ linear_scan_regalloc (struct m_reg_class_desc *classes)
}
 
   /* Everything live-in in this BB has a start point before
- our first insn.  */
+our first insn.  */
   int before_start_number;
   if (hbb->m_first_insn)
before_start_number = hbb->m_first_insn->m_number;
@@ -570,7 +570,7 @@ linear_scan_regalloc (struct m_reg_class_desc *classes)
   are defined at the start of the routine (prologue).  */
if (ind2reg[i]->m_lr_begin == insn_order)
  ind2reg[i]->m_lr_begin = 0;
-   /* All regs that have no use but a def will have lr_end == 0, 
+   /* All regs that have no use but a def will have lr_end == 0,
   they are actually live from def until after the insn they are
   defined in.  */
if (ind2reg[i]->m_lr_end == 0)
@@ -672,7 +672,7 @@ regalloc (void)
   basic_block bb;
   m_reg_class_desc classes[4];
 
-  /* If there are no registers used in the function, exit right away. */
+  /* If there are no registers used in the function, exit right away.  */
   if (hsa_cfun->m_reg_count == 0)
 return;
 
-- 
2.6.4



Re: [hsa merge 07/10] IPA-HSA pass

2016-01-15 Thread Martin Jambor
Hi,

On Fri, Jan 15, 2016 at 04:01:49PM +0100, Jakub Jelinek wrote:
> On Fri, Jan 15, 2016 at 03:53:23PM +0100, Martin Jambor wrote:
> > @@ -317,7 +319,7 @@ public:
> >  bool
> >  pass_ipa_hsa::gate (function *)
> >  {
> > -  return hsa_gen_requested_p () || in_lto_p;
> > +  return hsa_gen_requested_p ();
> >  }
> >  
> >  } // anon namespace
> 
> I actually didn't mean this, I mean more of:
>   return (hsa_gen_requested_p ()
> #ifdef ENABLE_HSA
> || in_lto_p
> #endif
>);
> or so.  Unless you arrange in lto-wrapper or where that if
> HSA is enabled in any LTO input source, then it is enabled also in
> lto1.  If you do that, your change is fine.
> 

This pass only creates HSA specific clones of ungridified target and
parallel regions and functions marked with declare target.  Whether or
not any HSAIL is emitted is then controlled in the hsa-gen pass gate.
The in_lto_p part was in fact a relict of a previous implementation.

So while I agree that making such a change to lto-wrapper would be
beneficial (although then we should limit its activity only to those
nodes which come from enabled units), the change above does not make
the current situation worse.  I will make sure to look into
lto-wrapper but meanwhile I still prefer the new condition.

We have tested the new change and LTO compiled code with HSA enabled
and LTO linked it with HSA disabled and:
  1) if there was no gridified loop, the result was like HSA was
 disabled from the start

  2) if there was a gridified kernel, the compiler compiled the kernel
 for the host but did not register it with libgomp and it ended up
 as an unreachable function.

How do other accelerators cope with the situation when half of the
application is compiled with the accelerator disabled?  (Would some of
their calls to GOMP_target_ext lead to abort?)

Martin


Re: [hsa merge 08/10] HSAIL BRIG description header file

2016-01-15 Thread Martin Jambor
On Fri, Jan 15, 2016 at 01:03:35PM +0100, Jakub Jelinek wrote:
> On Fri, Jan 15, 2016 at 11:37:32AM +0100, Jakub Jelinek wrote:
> > On Fri, Jan 15, 2016 at 11:14:33AM +0100, Martin Jambor wrote:
> > > > Martin, could you ask the HSA Foundation or AMD or whoever if there is
> > > > any way they could remove the second requirement of the license?  It
> > > > adds yet another case where anybody distributing GCC has to list yet
> > > > another copyright notice.
> > > 
> > > I will raise this with the HSA PRM group and perhaps there is a slight
> > > chance that they will change this in the upcoming version of HSAIL.
> > > But it is not going to happen soon enough.
> > 
> > Under what license is
> > http://www.hsafoundation.com/html/Content/PRM/Topics/18_BRIG/_chpStr_BRIG_HSAIL_binary_format.htm
> > ?  Sounds the same as the pdf to me.
> > Unlike the pdf version thereof, you could grab the ... chunks
> > out of this fairly easily with recursive wget and some quick scripting.
> 
> E.g.
> for i in `seq 2 123`; do sed 
> 's/\r$//;s/</]*>/\n\n/g;s/<\/pre>/\n<\/pre>\n/g;s/ name=[^>]*><\/a>//g' $i | sed -n '/^$/,/^<\/pre>$/{/^<.*pre>$/d;p}'; done
> on downloaded (in the order of appearance in the toc) files, I get
> following, which while it doesn't compile, I suppose some manual reordering
> and if it is needed in C, also e.g. in case of typedef BrigModuleHeader* 
> BrigModule_t; adding
> struct before BrigModuleHeader or turning that struct also into a typedef, 
> might make it work.
> Now the question is if it covers all you care about.
> 

Yes it does.  We have massaged it just a little and it works fine (and
the compiler is also also basically the same binary-wise).  So we will
go with the following hsa-brig-format.h (in its old location in gcc/).

Thanks for this input, it really helped,

Martin


/* HSA BRIG (binary representation of HSAIL) 1.0.1 representation description.
   Copyright (C) 2016 Free Software Foundation, Inc.

This file is part of GCC.

GCC is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3, or (at your option)
any later version.

GCC is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with GCC; see the file COPYING3.  If not see
<http://www.gnu.org/licenses/>.

The contents of the file was created by extracting data structures, enum,
typedef and other definitions from HSA Programmer's Reference Manual Version
1.0.1 (http://www.hsafoundation.com/standards/).

HTML version is provided on the following link:
http://www.hsafoundation.com/html/Content/PRM/Topics/PRM_title_page.htm */

#ifndef HSA_BRIG_FORMAT_H
#define HSA_BRIG_FORMAT_H

struct BrigModuleHeader;
typedef uint16_t BrigKind16_t;
typedef uint32_t BrigVersion32_t;

typedef BrigModuleHeader *BrigModule_t;
typedef uint32_t BrigDataOffset32_t;
typedef uint32_t BrigCodeOffset32_t;
typedef uint32_t BrigOperandOffset32_t;
typedef BrigDataOffset32_t BrigDataOffsetString32_t;
typedef BrigDataOffset32_t BrigDataOffsetCodeList32_t;
typedef BrigDataOffset32_t BrigDataOffsetOperandList32_t;
typedef uint8_t BrigAlignment8_t;

enum BrigAlignment
{
  BRIG_ALIGNMENT_NONE = 0,
  BRIG_ALIGNMENT_1 = 1,
  BRIG_ALIGNMENT_2 = 2,
  BRIG_ALIGNMENT_4 = 3,
  BRIG_ALIGNMENT_8 = 4,
  BRIG_ALIGNMENT_16 = 5,
  BRIG_ALIGNMENT_32 = 6,
  BRIG_ALIGNMENT_64 = 7,
  BRIG_ALIGNMENT_128 = 8,
  BRIG_ALIGNMENT_256 = 9
};

typedef uint8_t BrigAllocation8_t;

enum BrigAllocation
{
  BRIG_ALLOCATION_NONE = 0,
  BRIG_ALLOCATION_PROGRAM = 1,
  BRIG_ALLOCATION_AGENT = 2,
  BRIG_ALLOCATION_AUTOMATIC = 3
};

typedef uint8_t BrigAluModifier8_t;

enum BrigAluModifierMask
{
  BRIG_ALU_FTZ = 1
};

typedef uint8_t BrigAtomicOperation8_t;

enum BrigAtomicOperation
{
  BRIG_ATOMIC_ADD = 0,
  BRIG_ATOMIC_AND = 1,
  BRIG_ATOMIC_CAS = 2,
  BRIG_ATOMIC_EXCH = 3,
  BRIG_ATOMIC_LD = 4,
  BRIG_ATOMIC_MAX = 5,
  BRIG_ATOMIC_MIN = 6,
  BRIG_ATOMIC_OR = 7,
  BRIG_ATOMIC_ST = 8,
  BRIG_ATOMIC_SUB = 9,
  BRIG_ATOMIC_WRAPDEC = 10,
  BRIG_ATOMIC_WRAPINC = 11,
  BRIG_ATOMIC_XOR = 12,
  BRIG_ATOMIC_WAIT_EQ = 13,
  BRIG_ATOMIC_WAIT_NE = 14,
  BRIG_ATOMIC_WAIT_LT = 15,
  BRIG_ATOMIC_WAIT_GTE = 16,
  BRIG_ATOMIC_WAITTIMEOUT_EQ = 17,
  BRIG_ATOMIC_WAITTIMEOUT_NE = 18,
  BRIG_ATOMIC_WAITTIMEOUT_LT = 19,
  BRIG_ATOMIC_WAITTIMEOUT_GTE = 20
};

struct BrigBase
{
  uint16_t byteCount;
  BrigKind16_t kind;
};

typedef uint8_t BrigCompareOperation8_t;

enum BrigCompareOperation
{
  BRIG_COMPARE_EQ = 0,
  BRIG_COMPARE_N

Re: [hsa merge 09/10] Majority of the HSA back-end

2016-01-15 Thread Martin Jambor
Hi,

bootstrapping on i686-linux revealed the need for the following simple
patch.  I've run into two types of compilation errors on
powerpc-ibm-aix (no htolenn functions and ASM_GENERATE_INTERNAL_LABEL
somehow expanding to undeclared rs6000_xcoff_strip_dollar).  I plan to
workaround them quickly by making most of the contents of hsa-*.c
files compiled only conditionally (and leave potential hsa support on
non-linux platforms for later), but I will not have time to do the
change and test it properly until Monday.

But that will hopefully really be it,

Martin


2016-01-16  Martin Jambor  

* hsa-dump.c (dump_hsa_symbol): Add missing argumet cast.

diff --git a/gcc/hsa-dump.c b/gcc/hsa-dump.c
index af79bcb..c5f1f69 100644
--- a/gcc/hsa-dump.c
+++ b/gcc/hsa-dump.c
@@ -720,7 +720,7 @@ dump_hsa_symbol (FILE *f, hsa_symbol *symbol)
   hsa_type_name (symbol->m_type & ~BRIG_TYPE_ARRAY_MASK), name);
 
   if (symbol->m_type & BRIG_TYPE_ARRAY_MASK)
-fprintf (f, "[%lu]", symbol->m_dim);
+fprintf (f, "[%lu]", (unsigned long) symbol->m_dim);
 }
 
 /* Dump textual representation of HSA IL operand OP to file F.  */


Re: [hsa merge 08/10] HSAIL BRIG description header file

2016-01-18 Thread Martin Jambor
Hi,

On Sat, Jan 16, 2016 at 12:43:07PM +0100, Jakub Jelinek wrote:
> On Fri, Jan 15, 2016 at 06:23:05PM +0100, Martin Jambor wrote:
> >   BRIG_KIND_OPERAND_REGISTER = 0x300a,
> >   BRIG_KIND_OPERAND_STRING = 0x300b,
> >   BRIG_KIND_OPERAND_WAVESIZE = 0x3009c,
> >   BRIG_KIND_OPERAND_END = 0x300d
> 
> The above looks weird, I'd have expected BRIG_KIND_OPERAND_WAVESIZE
> to be 0x300c instead.  Bug in the standard?
> As typedef uint16_t BrigKind16_t;, I'm afraid this doesn't even fit
> into the data type.  Note the original brig header you've posted
> had this fixed.
> 

That is clearly a bug.  We did not catch it whe comparing the compiler
binary because we never use this constant.  Have you found this by
hand or did you do any more systematic comparison?

BRIG is always validated when finalized and I belive that fortunately
this particular bug would be caught by that as would majority of
similar "random" ones.

I am going to commit the following patch to the branch.

Thanks for spotting this.

Martin


2016-01-18  Martin Jambor  

* hsa-brig-format.h (BrigKind): Fix the value of
BRIG_KIND_OPERAND_WAVESIZE.
---
 gcc/hsa-brig-format.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/hsa-brig-format.h b/gcc/hsa-brig-format.h
index 247799b..e1c6cd2 100644
--- a/gcc/hsa-brig-format.h
+++ b/gcc/hsa-brig-format.h
@@ -303,7 +303,7 @@ enum BrigKind
   BRIG_KIND_OPERAND_OPERAND_LIST = 0x3009,
   BRIG_KIND_OPERAND_REGISTER = 0x300a,
   BRIG_KIND_OPERAND_STRING = 0x300b,
-  BRIG_KIND_OPERAND_WAVESIZE = 0x3009c,
+  BRIG_KIND_OPERAND_WAVESIZE = 0x300c,
   BRIG_KIND_OPERAND_END = 0x300d
 };
 
-- 
2.6.4




Re: [hsa merge 09/10] Majority of the HSA back-end

2016-01-18 Thread Martin Jambor
Hi,

On Sat, Jan 16, 2016 at 09:58:51AM +0100, Jakub Jelinek wrote:
> On Sat, Jan 16, 2016 at 12:49:12AM +0100, Martin Jambor wrote:
> > bootstrapping on i686-linux revealed the need for the following simple
> > patch.  I've run into two types of compilation errors on
> > powerpc-ibm-aix (no htolenn functions and ASM_GENERATE_INTERNAL_LABEL
> > somehow expanding to undeclared rs6000_xcoff_strip_dollar).  I plan to
> > workaround them quickly by making most of the contents of hsa-*.c
> > files compiled only conditionally (and leave potential hsa support on
> > non-linux platforms for later), but I will not have time to do the
> > change and test it properly until Monday.
> > 
> > But that will hopefully really be it,
> 
> IMHO you'd be best to write your own helpers for conversion to little
> endian (and back).
> gcc configure already has AC_C_BIGENDIAN (dunno how it handles pdp endian
> host though, so not sure if it is safe to rely on that), for recent GCC
> you can use __BYTE_ORDER__ macro to check endianity and __builtin_bswap*.
> So perhaps just
> #if GCC_VERSION >= 4006
> // use __BYTE_ORDER__ and __builtin_bswap or nothing
> #else
> // provide a safe slower default, with shifts and masking
> #endif
> 
> As for rs6000_xcoff_strip_dollar, look at other sources that use it what
> headers they do include, bet you want to #include "tm_p.h" to make it work.
> 

thanks for the suggestion.  With the following two patches, I can
compile HSA branch on powerpc-aix.  I'm going to prepare a new patch
with them, bootstrap it on x86_64, i686 and ppc-aix and unless
something new pops up again, I will commit it either at nigh today or
early morning tomorrow.

I have tested the slow paths of little endian conversion only very
rudimentarily but I did.  OTOH, I am actually not quite sure how 64
bit-wide numbers are spaced out on PDP-endian systems.  But I guess it
is OK to fix those only later if I am wrong.

I am also willing to incorporate any feedback later, even if it is
only a matter of style.

Thanks,

Martin


2016-01-18  Martin Jambor  

* hsa-brig.c: Include target.h and tm_p.h.
---
 gcc/hsa-brig.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c
index 9260c21..ee06804 100644
--- a/gcc/hsa-brig.c
+++ b/gcc/hsa-brig.c
@@ -23,6 +23,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "system.h"
 #include "coretypes.h"
 #include "tm.h"
+#include "target.h"
+#include "tm_p.h"
 #include "is-a.h"
 #include "vec.h"
 #include "hash-table.h"
-- 
2.6.4


2016-01-18  Martin Jambor  

* hsa-brig.c (lendian16): New function.  Changed all uses of htole16
to use it.
(lendian32): New function.  Changed all uses of htole32 to use it.
(lendian64): New function.  Changed all uses of htole64 to use it.
---
 gcc/hsa-brig.c | 412 ++---
 1 file changed, 245 insertions(+), 167 deletions(-)

diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c
index d4e644f..9260c21 100644
--- a/gcc/hsa-brig.c
+++ b/gcc/hsa-brig.c
@@ -44,6 +44,83 @@ along with GCC; see the file COPYING3.  If not see
 #include "hsa.h"
 #include "gomp-constants.h"
 
+/* Convert VAL to little endian form, if necessary.  */
+
+static uint16_t
+lendian16 (uint16_t val)
+{
+#if GCC_VERSION >= 4006
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+  return val;
+#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+  return __builtin_bswap16 (val);
+#else   /* __ORDER_PDP_ENDIAN__ */
+  return val;
+#endif
+#else
+// provide a safe slower default, with shifts and masking
+#ifndef WORDS_BIGENDIAN
+  return val;
+#else
+  return (val >> 8) | (val << 8);
+#endif
+#endif
+}
+
+/* Convert VAL to little endian form, if necessary.  */
+
+static uint32_t
+lendian32 (uint32_t val)
+{
+#if GCC_VERSION >= 4006
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+  return val;
+#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+  return __builtin_bswap32 (val);
+#else  /* __ORDER_PDP_ENDIAN__ */
+  return (val >> 16) | (val << 16);
+#endif
+#else
+// provide a safe slower default, with shifts and masking
+#ifndef WORDS_BIGENDIAN
+  return val;
+#else
+  val  = ((val & 0xff00ff00) >> 8) | ((val & 0xff00ff) << 8);
+  return (val >> 16) | (val << 16);
+#endif
+#endif
+}
+
+/* Convert VAL to little endian form, if necessary.  */
+
+static uint64_t
+lendian64 (uint64_t val)
+{
+#if GCC_VERSION >= 4006
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+  return val;
+#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+  return __builtin_bswap64 (val);
+#else  /* __ORDER_PDP_ENDIAN__ */
+  return (((val & 0x) << 48)
+ | ((val & 0x) << 16)

Re: [hsa merge 00/10] Merge of HSA branch

2016-01-19 Thread Martin Jambor
Hi,

On Wed, Jan 13, 2016 at 06:39:25PM +0100, Martin Jambor wrote:
> Hi,
> 
> this is hopefully the last big re-post of the HSA patches...

I have committed the combined patch as revision 232549 after
bootstrapping and testing all languages on x86_64-linux and i686-linux
and verifying I did not break powerpc-aix more than it was before.

I will be updating gcc offloading wiki in a few days, meanwhile you
can use README.hsa file from the branch:

https://gcc.gnu.org/viewcvs/gcc/branches/hsa/gcc/README.hsa?view=markup

I will be also posting followup testsuite patches.

> 
> Thanks everybody for patience and feedback.  While we are of course
> opened for mor more of it, let's also hope the approval process will
> finish soon as it should now.

I can't but repeat my thanks, especially to Jakub for the review and
help with the many last-minute issues.

Martin


[PR 69355] Correct hole detection when total_scalarization fails

2016-01-26 Thread Martin Jambor
Hi,

PR 69355 has revealed that when SRA attempts total scalarization of an
aggregate but this fails because the user type-casts a scalar field
and stores into a it a smaller aggregate (and the scalar field is not
written to, whether directly or as a part of an aggregate store), the
pass can loose track of unscalarized data there.

I think that this can happen only when violating strict aliasing rules
but with -fno-strict-aliasing it should work.

Fixed thusly with the patch below (the condition is there to avoid
detecting padding after aggregate-fields in totally-scalarized
aggregates as unscalarized data).  Bootstrapped and tested on
x86_64-linux.  OK for trunk?  And the gcc-5 branch?

Thanks,

Martin


2016-01-26  Martin Jambor  

PR tree-optimization/69355
* tree-sra.c (analyze_access_subtree): Correct hole detection when
total_scalarization fails.

testsuite/
* gcc.dg/tree-ssa/pr69355.c: New test.

---
 gcc/testsuite/gcc.dg/tree-ssa/pr69355.c | 44 +
 gcc/tree-sra.c  |  2 +-
 2 files changed, 45 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr69355.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr69355.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr69355.c
new file mode 100644
index 000..f515c21
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr69355.c
@@ -0,0 +1,44 @@
+/* { dg-do run } */
+/* { dg-options "-O -fno-strict-aliasing" } */
+
+struct S
+{
+  void *a;
+  long double b;
+};
+
+struct Z
+{
+  long long l;
+  short s;
+} __attribute__((packed));
+
+struct S __attribute__((noclone, noinline))
+foo (void *v, struct Z *z)
+{
+  struct S t;
+  t.a = v;
+  *(struct Z *) &t.b = *z;
+  return t;
+}
+
+struct Z gz;
+
+int
+main (int argc, char **argv)
+{
+  struct S s;
+
+  if (sizeof (long double) < sizeof (struct Z))
+return 0;
+
+  gz.l = 0xbeef;
+  gz.s = 0xab;
+
+  s = foo ((void *) 0, &gz);
+
+  if struct Z *) &s.b)->l != gz.l)
+  || (((struct Z *) &s.b)->s != gz.s))
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 740542f..b0e737a 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -2421,7 +2421,7 @@ analyze_access_subtree (struct access *root, struct 
access *parent,
 
   if (covered_to < limit)
hole = true;
-  if (scalar)
+  if (scalar || !allow_replacements)
root->grp_total_scalarization = 0;
 }
 
-- 
2.7.0



Re: [gomp4] Un-parallelized OpenACC kernels constructs with nvptx offloading: "avoid offloading"

2016-01-26 Thread Martin Jambor
On Fri, Jan 22, 2016 at 02:18:38PM +0100, Bernd Schmidt wrote:
> On 01/22/2016 09:36 AM, Jakub Jelinek wrote:
> >
> >I think it is a bad idea to go against what the user wrote.  Warning that
> >some code might not be efficient?  Perhaps (if properly guarded with some
> >warning option one can turn off, either on a per-source file or using
> >pragmas even more fine grained).  But by default not offloading?  That is
> >just wrong.
> 
> I'm leaning more towards Thomas' side of the argument. The kernels construct
> is a hint, a "do your best" request to the compiler. If the compiler sees
> that it can't parallelize a loop inside a kernels region, it's probably best
> not to offload it.
> 

Shouldn't such optimization feedback be output in MSG_NOTE dumps?
Vectorizer uses it to inform the user what it is doing, supposedly
with the intention to help the programmer find out why specific loops
are not vectorized (and run slowly).  I have also decided to use it to
inform the user whether a combination of OpenMP constructs is
gridified or not.

Unfortunately, notes seem to appear only in "detailed" dumps, which
often are not the best place for users to look into because of too
much information on gcc internals.  So the user interface aspect of
notes could perhaps be re-thought a bit.

In any event, I think that at least in the near term, good compiler
feedback could ease the efficient use of accelerators quite a lot,
like (they say) it did with early auto-vectorizing compilers.

Martin



Re: [hsa merge 00/10] Merge of HSA branch

2016-01-27 Thread Martin Jambor
Hi,

sorry for getting so late to this:

On Thu, Jan 21, 2016 at 05:10:17PM -0600, Gerald Pfeifer wrote:
> On Tue, 19 Jan 2016, Richard Biener wrote:
> > I think the merge warrants a NEWS entry on gcc.gnu.org/
> 
> ...and gcc-6/changes.html. :-)
> 
> Martin, happy to help.  Want to propose some text (or even patch)?
> 

So what would you think about the following?  Perhaps it is too
verbose but I wanted to mention the few areas users should know have
changed, if they happen to try HSA out.  I can certainly cut it down a
bit.

Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
retrieving revision 1.52
diff -u -r1.52 changes.html
--- changes.html25 Jan 2016 15:09:55 -  1.52
+++ changes.html27 Jan 2016 14:15:49 -
@@ -272,6 +272,30 @@

 
 
+Heterogeneous Systems Architecture
+   
+ GCC can now generate HSAIL for simple OpenMP device constructs
+   if configured with --enable-offload-targets=hsa.  A new
+   libgomp plugin then run these HSAIL kernels implementing these
+   constructs on HSA capable GPUs via standard HSA run-time.
+   
+   If the HSA compilation back-end determines it cannot output HSAIL
+   for a particular input, it gives a warning by default.  These
+   warnings can be suppressed with -Wno-hsa.  To give a
+   few examples, the HSA back-end does not implement compilation of
+   code using function pointers and variable-sized variables and
+   parameters, functions with variadic arguments as well as a number of
+   other less common programming constructs.
+
+   When compilation for HSA is enabled, the compiler attempts to
+   compile composite OpenMP constructs
+
+#pragma omp target teams distribute parallel for
+into parallel HSA GPU kernels.
+ 
+   
+
+
 IA-32/x86-64

  GCC now supports the Intel CPU named Skylake with AVX-512 extensions

The change to the news on the main page might then be:

Index: index.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/index.html,v
retrieving revision 1.992
diff -u -r1.992 index.html
--- index.html  24 Jan 2016 23:54:36 -  1.992
+++ index.html  27 Jan 2016 14:16:25 -
@@ -52,6 +52,13 @@
 
 
 
+ Heterogeneous Systems Architecture support in GCC
+ [2016-01-27]
+ http://www.hsafoundation.com/";> Heterogeneous Systems
+ Architecture 1.0 https://gcc.gnu.org/gcc-6/changes.html#hsa";>
+ support was added to GCC.  Contributed by Martin Jambor, Martin Liška
+ and Michael Matz from SUSE.
+
 GCC 5.3 released
 [2015-12-04]
 

Any comments welcome.

Thanks,

Martin


Re: Martin Jambor appointed HSA Maintainer

2016-01-29 Thread Martin Jambor
Hi,

On Fri, Dec 18, 2015 at 08:41:41AM -0500, David Edelsohn wrote:
>   I am pleased to announce that the GCC Steering Committee has
> appointed Martin Jambor as HSA maintainer.
> 
>   Please join me in congratulating Martin on his new role.
> Martin, please update your listing in the MAINTAINERS file.
> 

thank you very much for your trust.  I will do my best when carrying
out the associated duties.  Now that HSA is also in, I have committed
the following change to the MAINTAINERS file.

Martin

2016-01-29  Martin Jambor  

* MAINTAINERS (hsa maintainers): Add myself.
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index a5afeb7..aa757ea 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -209,6 +209,7 @@ fixincludes Bruce Korb  
 *gimpl*Jason Merrill   
 gcse.c Jeff Law
 global opt framework   Jeff Law
+hsa        Martin Jambor   
 jump.c David S. Miller 
 web pages  Gerald Pfeifer  
 config.sub/config.guessBen Elliston
-- 
2.7.0



[hsa] Atomic assess memory model fixes

2016-01-29 Thread Martin Jambor
Hi,

this is a followup to comments by Jakub and Richi on handling of
memory models in HSA atomic operations:

- I have made user-visible diagnostics lower case simple words, rather
  than constant identifiers.

- I have added masking by MEMMODEL_BASE_MASK where appropriate.

- I have made sure that warning code does not crash even when it
  encounters an unknown model and that it never warns multiple times.

- I have fixed handling of atomic load operations which wrongly
  insisted on release semantics instead of acquire (apart from
  relaxed).

- And last but not least, after looking at the respective
  documentations, I have convinced myself that __ATOMIC_SEQ_CST can be
  implemented using the HSA scacq, screl and scar memory orders, so I
  implemented that.

Bootstrapped and tested on x86_64-linux.  Since all of the above seems
to be worth fixing and low risk, I am going to commit it trunk even at
this stage, even though of course nothing in HSA is a regression.

Thanks,

Martin


2016-01-29  Martin Jambor  

* hsa-gen.c (get_memory_order_name): Mask with MEMMODEL_BASE_MASK.
Use short lowercase names.
(get_memory_order): Mask with MEMMODEL_BASE_MASK.  Support
MEMMODEL_CONSUME with acquire semantics and MEMMODEL_SEQ_CST with
acq_rel one.  Protect warning agains segfaults if
get_memory_order_name returns NULL.
(gen_hsa_ternary_atomic_for_builtin): Support with MEMMODEL_SEQ_CST
with release semantics.  Do not warn if get_memory_order already did.
(gen_hsa_insns_for_call): Support with MEMMODEL_SEQ_CST with acquire
semantics.  Fix check for relaxed or acquire semantics.  Do not warn
if get_memory_order already did.
---
 gcc/hsa-gen.c | 59 ---
 1 file changed, 40 insertions(+), 19 deletions(-)

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index e8f80da..768c2cf 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -4415,20 +4415,20 @@ get_address_from_value (tree val, hsa_bb *hbb)
 static const char *
 get_memory_order_name (unsigned memmodel)
 {
-  switch (memmodel)
+  switch (memmodel & MEMMODEL_BASE_MASK)
 {
 case MEMMODEL_RELAXED:
-  return "__ATOMIC_RELAXED";
+  return "relaxed";
 case MEMMODEL_CONSUME:
-  return "__ATOMIC_CONSUME";
+  return "consume";
 case MEMMODEL_ACQUIRE:
-  return "__ATOMIC_ACQUIRE";
+  return "acquire";
 case MEMMODEL_RELEASE:
-  return "__ATOMIC_RELEASE";
+  return "release";
 case MEMMODEL_ACQ_REL:
-  return "__ATOMIC_ACQ_REL";
+  return "acq_rel";
 case MEMMODEL_SEQ_CST:
-  return "__ATOMIC_SEQ_CST";
+  return "seq_cst";
 default:
   return NULL;
 }
@@ -4440,21 +4440,31 @@ get_memory_order_name (unsigned memmodel)
 static BrigMemoryOrder
 get_memory_order (unsigned memmodel, location_t location)
 {
-  switch (memmodel)
+  switch (memmodel & MEMMODEL_BASE_MASK)
 {
 case MEMMODEL_RELAXED:
   return BRIG_MEMORY_ORDER_RELAXED;
+case MEMMODEL_CONSUME:
+  /* HSA does not have an equivalent, but we can use the slightly stronger
+ACQUIRE.  */
 case MEMMODEL_ACQUIRE:
   return BRIG_MEMORY_ORDER_SC_ACQUIRE;
 case MEMMODEL_RELEASE:
   return BRIG_MEMORY_ORDER_SC_RELEASE;
 case MEMMODEL_ACQ_REL:
+case MEMMODEL_SEQ_CST:
+  /* Callers implementing a simple load or store need to remove the release
+or acquire part respectively.  */
   return BRIG_MEMORY_ORDER_SC_ACQUIRE_RELEASE;
 default:
-  HSA_SORRY_ATV (location,
-"support for HSA does not implement memory model: %s",
-get_memory_order_name (memmodel));
-  return BRIG_MEMORY_ORDER_NONE;
+  {
+   const char *mmname = get_memory_order_name (memmodel);
+   HSA_SORRY_ATV (location,
+  "support for HSA does not implement the specified "
+  " memory model%s %s",
+  mmname ? ": " : "", mmname ? mmname : "");
+   return BRIG_MEMORY_ORDER_NONE;
+  }
 }
 }
 
@@ -4523,13 +4533,20 @@ gen_hsa_ternary_atomic_for_builtin (bool ret_orig,
   nops = 2;
 }
 
-  if (acode == BRIG_ATOMIC_ST && memorder != BRIG_MEMORY_ORDER_RELAXED
-  && memorder != BRIG_MEMORY_ORDER_SC_RELEASE)
+  if (acode == BRIG_ATOMIC_ST)
 {
-  HSA_SORRY_ATV (gimple_location (stmt),
-"support for HSA does not implement memory model for "
-"ATOMIC_ST: %s", get_memory_order_name (mmodel));
-  return;
+  if (memorder == BRIG_MEMORY_ORDER_SC_ACQUIRE_RELEASE)
+   memorder = BRIG_MEMORY_ORDER_SC_RELEASE;
+
+  if (memorder != BRIG_MEMORY_ORDER_RELAXED
+ &

Re: [hsa merge 00/10] Merge of HSA branch

2016-02-02 Thread Martin Jambor
Hi,

On Thu, Jan 28, 2016 at 08:18:27AM -0700, Gerald Pfeifer wrote:
> 
> This is okay with the changes/considering the questions above.
> 

thanks for the feedback.  I have committed the following after
incorporating the comments.

Martin

Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
retrieving revision 1.52
diff -u -r1.52 changes.html
--- changes.html25 Jan 2016 15:09:55 -  1.52
+++ changes.html2 Feb 2016 14:09:11 -
@@ -272,6 +272,30 @@

 
 
+Heterogeneous Systems Architecture
+   
+ GCC can now generate HSAIL (Heterogeneous System Architecture
+   Intermediate Language) for simple OpenMP device constructs if
+   configured with --enable-offload-targets=hsa.  A new
+   libgomp plugin then runs the HSA GPU kernels implementing these
+   constructs on HSA capable GPUs via a standard HSA run time.
+   
+   If the HSA compilation back end determines it cannot output HSAIL
+   for a particular input, it gives a warning by default.  These
+   warnings can be suppressed with -Wno-hsa.  To give a few
+   examples, the HSA back end does not implement compilation of code
+   using function pointers, automatic allocation of variable sized
+   arrays, functions with variadic arguments as well as a number of
+   other less common programming constructs.
+
+   When compilation for HSA is enabled, the compiler attempts to
+   compile composite OpenMP constructs
+
+#pragma omp target teams distribute parallel for
+into parallel HSA GPU kernels.
+ 
+   
+
 IA-32/x86-64

  GCC now supports the Intel CPU named Skylake with AVX-512 extensions




Index: index.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/index.html,v
retrieving revision 1.993
diff -u -r1.993 index.html
--- index.html  30 Jan 2016 06:01:48 -  1.993
+++ index.html  2 Feb 2016 14:10:25 -
@@ -50,6 +50,13 @@
 News
 
 
+ Heterogeneous Systems Architecture support
+ [2016-01-27]
+ http://www.hsafoundation.com/";> Heterogeneous Systems
+ Architecture 1.0 https://gcc.gnu.org/gcc-6/changes.html#hsa";>
+ support was added to GCC, contributed by Martin Jambor, Martin Liška
+ and Michael Matz from SUSE.
+
 GCC 5.3 released
 [2015-12-04]
 


[hsa branch] Map collapse(2) and collapse(3) to HSA grid dimensions

2016-02-02 Thread Martin Jambor
Hi,

with HSA merged, the hsa branch can be used for development of new
features again.  Thus, I have committed there a patch which I finished
after the merge proposal and thus I kept in a private branch so far,
which allows collapse(2) and collapse(3) clauses to be gridified and
the individual loops to be directly mapped to HSA grid dimensions.

In order to achieve, that I needed to introduce hsa-specific builtins
which expand to HSAIL instructions giving information about specific
HSA grid dimensions.  I hope I have done that right, any comments are
welcome.

Other than that, the changes are small because as I was restructuring
the code, I was moving it in this direction for some time already.
Committed to the branch (a few days ago actually, sorry for that).

Thanks,

Martin


2016-01-26  Martin Jambor  

gcc/
* Makefile.in (BUILTINS_DEF): Add hsa-builtins.def.
* builtins.def: Include hsa-builtins.def.
(DEF_HSA_BUILTIN): Define.
* hsa-builtins.def: New file.
* hsa-gen.c (query_hsa_grid): Accept dimension as an hsa_op_immed.
Add a new override.
(gen_hsa_insns_for_call): Handle BUILT_IN_HSA_GET_WORKITEM_ABSID.
* omp-low.c (grid_get_kernel_launch_attributes): Support up to
three dimensions.
(grid_expand_omp_for_loop): Likewise.
(lower_omp_for_lastprivate): Do not extract looptemps from grid loops.
(grid_target_follows_gridifiable_pattern): Allow collapse up to 3.
* tree-inline.h (copy_body_data): New field
decl_creation_prevention_level.  Moved remap_var_for_cilk to minimize
padding.

gcc/fortran/
* f95-lang.c: Include hsa-builtins.def.
(DEF_HSA_BUILTIN): Define.

libgomp/
* plugin/plugin-hsa.c (parse_target_attributes): Support up to three
dimensions.
(get_group_size): New function.
(GOMP_OFFLOAD_run): Support up to three dimensions.

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index ab9cbbf..a996708 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -899,7 +899,8 @@ RTL_H = $(RTL_BASE_H) $(FLAGS_H) genrtl.h
 READ_MD_H = $(OBSTACK_H) $(HASHTAB_H) read-md.h
 PARAMS_H = params.h params-enum.h params.def
 BUILTINS_DEF = builtins.def sync-builtins.def omp-builtins.def \
-   gtm-builtins.def sanitizer.def cilkplus.def cilk-builtins.def
+   gtm-builtins.def sanitizer.def cilkplus.def cilk-builtins.def \
+   hsa-builtins.def
 INTERNAL_FN_DEF = internal-fn.def
 INTERNAL_FN_H = internal-fn.h $(INTERNAL_FN_DEF)
 TREE_CORE_H = tree-core.h coretypes.h all-tree.def tree.def \
diff --git a/gcc/builtins.def b/gcc/builtins.def
index 2fc7f65..14d2335 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -188,6 +188,16 @@ along with GCC; see the file COPYING3.  If not see
|| flag_cilkplus \
|| flag_offload_abi != OFFLOAD_ABI_UNSET))
 
+#undef DEF_HSA_BUILTIN
+#ifdef ENABLE_HSA
+#define DEF_HSA_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
+  DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,\
+   false, false, true, ATTRS, false, \
+  (!flag_disable_hsa))
+#else
+#define DEF_HSA_BUILTIN(ENUM, NAME, TYPE, ATTRS)
+#endif
+
 /* Builtin used by implementation of Cilk Plus.  Most of these are decomposed
by the compiler but a few are implemented in libcilkrts.  */ 
 #undef DEF_CILK_BUILTIN_STUB
@@ -932,6 +942,9 @@ DEF_GCC_BUILTIN (BUILT_IN_LINE, "LINE", BT_FN_INT, 
ATTR_NOTHROW_LEAF_LIST)
 /* Offloading and Multi Processing builtins.  */
 #include "omp-builtins.def"
 
+/* Heterogeneous Systems Architecture.  */
+#include "hsa-builtins.def"
+
 /* Cilk keywords builtins.  */
 #include "cilk-builtins.def"
 
diff --git a/gcc/fortran/f95-lang.c b/gcc/fortran/f95-lang.c
index 9c3a311..efa750de 100644
--- a/gcc/fortran/f95-lang.c
+++ b/gcc/fortran/f95-lang.c
@@ -1234,6 +1234,17 @@ gfc_init_builtin_functions (void)
 #undef DEF_GOMP_BUILTIN
 }
 
+#ifdef ENABLE_HSA
+  if (!flag_disable_hsa)
+{
+#undef DEF_HSA_BUILTIN
+#define DEF_HSA_BUILTIN(code, name, type, attr) \
+  gfc_define_builtin ("__builtin_" name, builtin_types[type], \
+ code, name, attr);
+#include "../hsa-builtins.def"
+}
+#endif
+
   gfc_define_builtin ("__builtin_trap", builtin_types[BT_FN_VOID],
  BUILT_IN_TRAP, NULL, ATTR_NOTHROW_LEAF_LIST);
   TREE_THIS_VOLATILE (builtin_decl_explicit (BUILT_IN_TRAP)) = 1;
diff --git a/gcc/hsa-builtins.def b/gcc/hsa-builtins.def
new file mode 100644
index 000..e4681c1
--- /dev/null
+++ b/gcc/hsa-builtins.def
@@ -0,0 +1,31 @@
+/* This file contains the definitions and documentation for the
+   Offloading and Multi Processing builtins used in the GNU compiler.
+   Copyright (C) 2005-2015 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU Gene

Re: [RFC] Extend ipa-bitwise-cp with pointer alignment propagation

2016-10-05 Thread Martin Jambor
Hi,

sorry, my main desktop disk has died (a slow but certain) death so I
am not particularly responsive either.

On Tue, Oct 04, 2016 at 12:37:38AM +0530, Prathamesh Kulkarni wrote:
> On 22 September 2016 at 17:26, Jan Hubicka  wrote:
> > Yes, can you please verify that alignments it computes are monotonously
> > worse than those your new code computes and include the removal in the
> > next iteration of the patch?
> >>
> > Otherwise the patch seems fine to me (modulo Richard's comments)


> I tried to verify the alignments are monotonously worse with the
> attached patch (verify.diff),
> which asserts that alignment lattice is not better than bits lattice
> during each propagation
> step in propagate_constants_accross_call().
> Does that look OK ?

After propagation, here should be no TOP lattices anywhere.  That
would mean we have not delteted an unreachable node.  Apart from that,
yes.

> 
> ipa-cp-alignment has better alignments than ipa-bit-cp in following cases:
> 
> a) ipa_get_type() returns NULL: ipa-bits-cp sets lattice to bottom if
> ipa_get_type (param) returns NULL,
> for instance in case of K&R function, while ipa-cp-alignment doesn't

What do you mean by "for instance?"  What are the other cases when it
happens?

> look at param types,
> and can propagate alignments.
> The following assert:
> if (bits_lattice.bottom_p ())
>   gcc_assert (align_lattice.bottom_p())
> 
> triggered for 400.perlbench, 403.gcc, 456.hmmer and 481.wrf due to

that is quite many more examples than I have anticipated, so they all
used K&R?   (But thanks for trying benchmarks diligently).

Have also tried this with LTO?

> ipa_get_type()
> returning NULL. I am not really sure how to handle this case, since we
> need to know parameter's
> type during bits propagation for obtaining precision.
> 
> b) This happens for attached test-case (test.i),
> which is a reduced (and slightly modified) test-case from 458.sjeng.
> Bits propagation sets lattice to bottom, while alignment propagation
> propagates .

yes, I agree we do not need to worry about the case when alignment is 1.

I am only slightly concerned how often ipa_get_type is NULL, so it
would be nice if you looked into those cases once more to make sure
that we do not miss some bug or something that we could handle easily.
But if it is only K&R, I think it is fine.

Thanks,

Martin


[hsa-branch 4/9] Add expansion of reciprocal of square root

2016-10-10 Thread Martin Jambor
Hi,

this patch is a simple addition of reciprocal of square root gimple
function into its HSAIL equivalent.

Committed to the branch, queued for merge to trunk soon.
Thanks,

Martin

2016-10-03  Martin Jambor  

* hsa-gen.c (gen_hsa_insn_for_internal_fn_call): Also handle IFN_RSQRT.
---
 gcc/hsa-gen.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index deb2a07..efb87a0 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -5386,6 +5386,10 @@ gen_hsa_insn_for_internal_fn_call (gcall *stmt, hsa_bb 
*hbb)
   gen_hsa_unaryop_for_builtin (BRIG_OPCODE_SQRT, stmt, hbb);
   break;
 
+case IFN_RSQRT:
+  gen_hsa_unaryop_for_builtin (BRIG_OPCODE_NRSQRT, stmt, hbb);
+  break;
+
 case IFN_TRUNC:
   gen_hsa_unaryop_for_builtin (BRIG_OPCODE_TRUNC, stmt, hbb);
   break;
-- 
2.10.0



[hsa-branch 2/9] Lastprivate lowering for gridified kernels

2016-10-10 Thread Martin Jambor
Hi,

this patch implements the lastprivate data sharing clauses of gridified
OpenMP looping constructs.  It adds code to construct a special
condition to identify he "last" loop iteration using special HSA
instructions, because that way we do not need information about all HSA
dimensions conveyed from callers and could modify only a small fraction
of the non-gridification code.

On the gridification side, it creates group-segment copies of internal
loop lastprivate variables as means to transfer the value from the
"last" work-item to all work-items that then continue working with the
value.

Committed to the branch, queued for merge to trunk soon.
Thanks,

Martin

2016-10-03  Martin Jambor  

* gimple.h (GF_OMP_FOR_GRID_PHONY): Added comment.
(GF_OMP_FOR_GRID_INTRA_GROUP): New.
(gimple_omp_for_grid_phony): Added checking assert.
(gimple_omp_for_set_grid_phony): Likewise.
(gimple_omp_for_grid_intra_group): New function.
(gimple_omp_for_set_grid_intra_group): Likewise.
(gimple_omp_for_grid_group_iter): Added checking assert.
(gimple_omp_for_set_grid_group_iter): Likewise.
* omp-low.c (lower_lastprivate_clauses): Also handle predicates
that are not simple comparisons.
(grid_lastprivate_predicate): New function.
(lower_omp_for_lastprivate): Generate conditions for gridified kernels.
(lower_omp_for): Adjust phony predicate call.
(grid_parallel_clauses_gridifiable): Allow lastprivate.
(grid_inner_loop_gridifiable_p): Likewise.
(grid_mark_tiling_loops): Generate copies of lastprivate variables
to group variables.
(grid_mark_tiling_parallels_and_loops): Create binds for bodies of
a parallel statements.
(grid_process_kernel_body_copy): Avoid reusing variable name.
---
 gcc/gimple.h  |  36 +
 gcc/omp-low.c | 235 +-
 2 files changed, 187 insertions(+), 84 deletions(-)

diff --git a/gcc/gimple.h b/gcc/gimple.h
index ce3a161..3e84e6b0 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -162,7 +162,12 @@ enum gf_mask {
 GF_OMP_FOR_KIND_CILKSIMD   = GF_OMP_FOR_SIMD | 1,
 GF_OMP_FOR_COMBINED= 1 << 4,
 GF_OMP_FOR_COMBINED_INTO   = 1 << 5,
+/* The following flag must not be used on GF_OMP_FOR_KIND_GRID_LOOP loop
+   statements.  */
 GF_OMP_FOR_GRID_PHONY  = 1 << 6,
+/* The following two flags should only be set on GF_OMP_FOR_KIND_GRID_LOOP
+   loop statements.  */
+GF_OMP_FOR_GRID_INTRA_GROUP= 1 << 6,
 GF_OMP_FOR_GRID_GROUP_ITER  = 1 << 7,
 GF_OMP_TARGET_KIND_MASK= (1 << 4) - 1,
 GF_OMP_TARGET_KIND_REGION  = 0,
@@ -5123,6 +5128,8 @@ gimple_omp_for_set_pre_body (gimple *gs, gimple_seq 
pre_body)
 static inline bool
 gimple_omp_for_grid_phony (const gomp_for *omp_for)
 {
+  gcc_checking_assert (gimple_omp_for_kind (omp_for)
+  != GF_OMP_FOR_KIND_GRID_LOOP);
   return (gimple_omp_subcode (omp_for) & GF_OMP_FOR_GRID_PHONY) != 0;
 }
 
@@ -5131,18 +5138,45 @@ gimple_omp_for_grid_phony (const gomp_for *omp_for)
 static inline void
 gimple_omp_for_set_grid_phony (gomp_for *omp_for, bool value)
 {
+  gcc_checking_assert (gimple_omp_for_kind (omp_for)
+  != GF_OMP_FOR_KIND_GRID_LOOP);
   if (value)
 omp_for->subcode |= GF_OMP_FOR_GRID_PHONY;
   else
 omp_for->subcode &= ~GF_OMP_FOR_GRID_PHONY;
 }
 
+/* Return the kernel_intra_group of a GRID_LOOP OMP_FOR statement.  */
+
+static inline bool
+gimple_omp_for_grid_intra_group (const gomp_for *omp_for)
+{
+  gcc_checking_assert (gimple_omp_for_kind (omp_for)
+  == GF_OMP_FOR_KIND_GRID_LOOP);
+  return (gimple_omp_subcode (omp_for) & GF_OMP_FOR_GRID_INTRA_GROUP) != 0;
+}
+
+/* Set kernel_intra_group flag of OMP_FOR to VALUE.  */
+
+static inline void
+gimple_omp_for_set_grid_intra_group (gomp_for *omp_for, bool value)
+{
+  gcc_checking_assert (gimple_omp_for_kind (omp_for)
+  == GF_OMP_FOR_KIND_GRID_LOOP);
+  if (value)
+omp_for->subcode |= GF_OMP_FOR_GRID_INTRA_GROUP;
+  else
+omp_for->subcode &= ~GF_OMP_FOR_GRID_INTRA_GROUP;
+}
+
 /* Return true if iterations of a grid OMP_FOR statement correspond to HSA
groups.  */
 
 static inline bool
 gimple_omp_for_grid_group_iter (const gomp_for *omp_for)
 {
+  gcc_checking_assert (gimple_omp_for_kind (omp_for)
+  == GF_OMP_FOR_KIND_GRID_LOOP);
   return (gimple_omp_subcode (omp_for) & GF_OMP_FOR_GRID_GROUP_ITER) != 0;
 }
 
@@ -5151,6 +5185,8 @@ gimple_omp_for_grid_group_iter (const gomp_for *omp_for)
 static inline void
 gimple_omp_for_set_grid_group_iter (gomp_for *omp_for, bool value)
 {
+  gcc_checking_assert (gimple_omp_for_kind (omp_for)
+  == GF_OMP_FOR_KIND_GRID_LOOP);
   if (value)
 omp_for->subcode |= G

[hsa-branch 3/9] Handle simds within gridified loops gracefully

2016-10-10 Thread Martin Jambor
Hi,

this patch deals with simd constructs in gridified OpenMP loops.
Standalone simds are dealt with by forcing the gridified copy to have
OMP_CLAUSE_SAFELEN_EXPR of one, while simds which are a part of a
combined construct with the gridified parallel loop are simply
discarded.

Committed to the branch, queued for merge to trunk soon.
Thanks,

Martin

2016-10-03  Martin Jambor  

* omp-low.c (grid_find_ungridifiable_statement): Do not bail out
for simd loops.
(grid_inner_loop_gridifiable_p): Likewise.
(grid_process_grid_body): New function.
(grid_eliminate_combined_simd_part): Likewise.
(grid_mark_tiling_loops): Use it. Walk body of the loop with
grid_process_grid_body.
(grid_process_kernel_body_copy): Likewise.
---
 gcc/omp-low.c | 137 +++---
 1 file changed, 122 insertions(+), 15 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 05015bd..a51474b 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -17478,17 +17478,6 @@ grid_find_ungridifiable_statement 
(gimple_stmt_iterator *gsi,
   *handled_ops_p = true;
   wi->info = stmt;
   return error_mark_node;
-
-case GIMPLE_OMP_FOR:
-  if ((gimple_omp_for_kind (stmt) & GF_OMP_FOR_SIMD)
- && gimple_omp_for_combined_into_p (stmt))
-   {
- *handled_ops_p = true;
- wi->info = stmt;
- return error_mark_node;
-   }
-  break;
-
 default:
   break;
 }
@@ -17614,10 +17603,6 @@ grid_inner_loop_gridifiable_p (gomp_for *gfor, 
grid_prop *grid)
dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
   GRID_MISSED_MSG_PREFIX "the inner loop contains "
   "call to a noreturn function\n");
- else if (gimple_code (bad) == GIMPLE_OMP_FOR)
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
-GRID_MISSED_MSG_PREFIX "the inner loop contains "
-"a simd construct\n");
  else
dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc,
 GRID_MISSED_MSG_PREFIX "the inner loop contains "
@@ -18212,6 +18197,113 @@ grid_copy_leading_local_assignments (gimple_seq src, 
gimple_stmt_iterator *dst,
   return NULL;
 }
 
+/* Statement walker function to make adjustments to statements within the
+   gridifed kernel copy.  */
+
+static tree
+grid_process_grid_body (gimple_stmt_iterator *gsi, bool *handled_ops_p,
+   struct walk_stmt_info *)
+{
+  *handled_ops_p = false;
+  gimple *stmt = gsi_stmt (*gsi);
+  if (gimple_code (stmt) == GIMPLE_OMP_FOR
+  && (gimple_omp_for_kind (stmt) & GF_OMP_FOR_SIMD))
+  {
+gomp_for *loop = as_a  (stmt);
+tree clauses = gimple_omp_for_clauses (loop);
+tree cl = find_omp_clause (clauses, OMP_CLAUSE_SAFELEN);
+if (cl)
+  OMP_CLAUSE_SAFELEN_EXPR (cl) = integer_one_node;
+else
+  {
+   tree c = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE_SAFELEN);
+   OMP_CLAUSE_SAFELEN_EXPR (c) = integer_one_node;
+   OMP_CLAUSE_CHAIN (c) = clauses;
+   gimple_omp_for_set_clauses (loop, c);
+  }
+  }
+  return NULL_TREE;
+}
+
+/* Given a PARLOOP that is a normal for looping construct but also a part of a
+   combined construct with a simd loop, eliminate the simd loop.  */
+
+static void
+grid_eliminate_combined_simd_part (gomp_for *parloop)
+{
+  struct walk_stmt_info wi;
+
+  memset (&wi, 0, sizeof (wi));
+  wi.val_only = true;
+  enum gf_mask msk = GF_OMP_FOR_SIMD;
+  wi.info = (void *) &msk;
+  walk_gimple_seq (gimple_omp_body (parloop), find_combined_for, NULL, &wi);
+  gimple *stmt = (gimple *) wi.info;
+  /* We expect that the SIMD id the only statement in the parallel loop.  */
+  gcc_assert (stmt
+ && gimple_code (stmt) == GIMPLE_OMP_FOR
+ && (gimple_omp_for_kind (stmt) == GF_OMP_FOR_SIMD)
+ && gimple_omp_for_combined_into_p (stmt)
+ && !gimple_omp_for_combined_p (stmt));
+  gomp_for *simd = as_a  (stmt);
+
+  /* Copy over the iteration properties because the body refers to the index in
+ the bottmom-most loop.  */
+  unsigned i, collapse = gimple_omp_for_collapse (parloop);
+  gcc_checking_assert (collapse == gimple_omp_for_collapse (simd));
+  for (i = 0; i < collapse; i++)
+{
+  gimple_omp_for_set_index (parloop, i, gimple_omp_for_index (simd, i));
+  gimple_omp_for_set_initial (parloop, i, gimple_omp_for_initial (simd, 
i));
+  gimple_omp_for_set_final (parloop, i, gimple_omp_for_final (simd, i));
+  gimple_omp_for_set_incr (parloop, i, gimple_omp_for_incr (simd, i));
+}
+
+  tree *tgt= gimple_omp_for_clauses_ptr (parloop);
+  while (*tgt)
+tgt = &OMP_CLAUSE_CHAIN (*tgt);
+
+  /

[hsa-branch 6/9] Expand FMA_EXPR to HSAIL

2016-10-10 Thread Martin Jambor
Hi,

the following patch adds expansion of fused multiply and add to HSAIL.
The scalar variant is straightforwardly converted to an HSAIL equivalent
while any vector instance is expanded into separate multiplication and
additions.

Committed to the branch, queued for merge to trunk soon.
Thanks,

Martin

2016-10-03  Martin Jambor  

* hsa-gen.c (gen_hsa_insns_for_operation_assignment): Handle
FMA_EXPR and ternary operators in general.  Remove obsolete
fallthrough comments.
---
 gcc/hsa-gen.c | 27 ---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index ac83e9e..ad40087 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -3076,6 +3076,23 @@ gen_hsa_insns_for_operation_assignment (gimple *assign, 
hsa_bb *hbb)
 case NEGATE_EXPR:
   opcode = BRIG_OPCODE_NEG;
   break;
+case FMA_EXPR:
+  /* There is a native HSA instruction for scalar FMAs but not for vector
+ones.  */
+  if (TREE_CODE (TREE_TYPE (lhs)) == VECTOR_TYPE)
+   {
+ hsa_op_reg *dest
+   = hsa_cfun->reg_for_gimple_ssa (gimple_assign_lhs (assign));
+ hsa_op_with_type *op1 = hsa_reg_or_immed_for_gimple_op (rhs1, hbb);
+ hsa_op_with_type *op2 = hsa_reg_or_immed_for_gimple_op (rhs2, hbb);
+ hsa_op_with_type *op3 = hsa_reg_or_immed_for_gimple_op (rhs3, hbb);
+ hsa_op_reg *tmp = new hsa_op_reg (dest->m_type);
+ gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp, op1, op2, hbb);
+ gen_hsa_binary_operation (BRIG_OPCODE_ADD, dest, tmp, op3, hbb);
+ return;
+   }
+  opcode = BRIG_OPCODE_MAD;
+  break;
 case MIN_EXPR:
   opcode = BRIG_OPCODE_MIN;
   break;
@@ -3275,14 +3292,18 @@ gen_hsa_insns_for_operation_assignment (gimple *assign, 
hsa_bb *hbb)
   switch (rhs_class)
 {
 case GIMPLE_TERNARY_RHS:
-  gcc_unreachable ();
+  {
+   hsa_op_with_type *op3 = hsa_reg_or_immed_for_gimple_op (rhs3, hbb);
+   hsa_insn_basic *insn = new hsa_insn_basic (4, opcode, dest->m_type, 
dest,
+  op1, op2, op3);
+   hbb->append_insn (insn);
+  }
   return;
 
-  /* Fall through */
 case GIMPLE_BINARY_RHS:
   gen_hsa_binary_operation (opcode, dest, op1, op2, hbb);
   break;
-  /* Fall through */
+
 case GIMPLE_UNARY_RHS:
   gen_hsa_unary_operation (opcode, dest, op1, hbb);
   break;
-- 
2.10.0



[hsa-branch 1/9] Builtins for gridsize and currentworkgroupsize

2016-10-10 Thread Martin Jambor
Hi,

the patch below makes the griddim and currentworkgroupsize special HSA
instructions available for omp lowering through a builtin.  They are
then used by subsequent patch to implement conditions determining the
last iteration for the lastprivate OpenMP sharing clause.

Committed to the branch, queued for merge to trunk soon.
Thanks,

Martin

2016-10-03  Martin Jambor  

* hsa-builtins.def (BUILT_IN_HSA_GRIDSIZE): New.
(BUILT_IN_HSA_CURRENTWORKGROUPSIZE): Likewise.
* hsa-gen.c (gen_hsa_insns_for_call): Handle BUILT_IN_HSA_GRIDSIZE.
---
 gcc/hsa-builtins.def | 4 
 gcc/hsa-gen.c| 6 ++
 2 files changed, 10 insertions(+)

diff --git a/gcc/hsa-builtins.def b/gcc/hsa-builtins.def
index dcd0c55..cc0409e 100644
--- a/gcc/hsa-builtins.def
+++ b/gcc/hsa-builtins.def
@@ -33,3 +33,7 @@ DEF_HSA_BUILTIN (BUILT_IN_HSA_WORKITEMID, "hsa_workitemid",
 BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_HSA_BUILTIN (BUILT_IN_HSA_WORKITEMABSID, "hsa_workitemabsid",
 BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_HSA_BUILTIN (BUILT_IN_HSA_GRIDSIZE, "hsa_gridsize",
+BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_HSA_BUILTIN (BUILT_IN_HSA_CURRENTWORKGROUPSIZE, "hsa_currentworkgroupsize",
+BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST)
diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index f63608c..deb2a07 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -5812,6 +5812,12 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
 case BUILT_IN_HSA_WORKITEMABSID:
   query_hsa_grid_dim (stmt, BRIG_OPCODE_WORKITEMABSID, hbb);
   break;
+case BUILT_IN_HSA_GRIDSIZE:
+  query_hsa_grid_dim (stmt, BRIG_OPCODE_GRIDSIZE, hbb);
+  break;
+case BUILT_IN_HSA_CURRENTWORKGROUPSIZE:
+  query_hsa_grid_dim (stmt, BRIG_OPCODE_CURRENTWORKGROUPSIZE, hbb);
+  break;
 
 case BUILT_IN_GOMP_BARRIER:
   hbb->append_insn (new hsa_insn_br (0, BRIG_OPCODE_BARRIER, 
BRIG_TYPE_NONE,
-- 
2.10.0



[hsa-branch 5/9] Properly detect variadic arguments

2016-10-10 Thread Martin Jambor
Hi,

this patch from Martin properly detects some variadic calls which we have
failed to detect before during expansion to HSAIL.

Committed to the branch, queued for merge to trunk soon.
Thanks,

Martin

2016-10-03  Martin Liska  
Martin Jambor  

* hsa-gen.c (verify_function_arguments): Properly detect variadic
arguments.
---
 gcc/hsa-gen.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index efb87a0..ac83e9e 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -3444,13 +3444,14 @@ gen_hsa_insns_for_switch_stmt (gswitch *s, hsa_bb *hbb)
 static void
 verify_function_arguments (tree decl)
 {
+  tree type = TREE_TYPE (decl);
   if (DECL_STATIC_CHAIN (decl))
 {
   HSA_SORRY_ATV (EXPR_LOCATION (decl),
 "HSA does not support nested functions: %D", decl);
   return;
 }
-  else if (!TYPE_ARG_TYPES (TREE_TYPE (decl)))
+  else if (!TYPE_ARG_TYPES (type) || stdarg_p (type))
 {
   HSA_SORRY_ATV (EXPR_LOCATION (decl),
 "HSA does not support functions with variadic arguments "
-- 
2.10.0



[hsa-branch 7/9] Ignore prefetch builtin

2016-10-10 Thread Martin Jambor
Hi,

this patch makes HSAIL expansion ignore prefetch built-ins.  It is a bit
less straightforward because we also need to handle cases where the call
does not pass gimple_call_builtin_p test because of argument type
mismatches.

Committed to the branch, queued for merge to trunk soon.
Thanks,

Martin

2016-10-03  Martin Jambor  

* hsa-gen.c (gen_hsa_insns_for_call): Ignore prefetch builtin.
---
 gcc/hsa-gen.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index ad40087..8893a28 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -5530,6 +5530,12 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
   if (!gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
 {
   tree function_decl = gimple_call_fndecl (stmt);
+  /* Prefetch pass can create type-mismatching prefetch builtin calls which
+fail the gimple_call_builtin_p test above.  Handle them here.  */
+  if (DECL_BUILT_IN_CLASS (function_decl)
+ && DECL_FUNCTION_CODE (function_decl) == BUILT_IN_PREFETCH)
+   return;
+
   if (function_decl == NULL_TREE)
{
  HSA_SORRY_AT (gimple_location (stmt),
@@ -5962,6 +5968,8 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
gen_hsa_alloca (call, hbb);
break;
   }
+case BUILT_IN_PREFETCH:
+  break;
 default:
   {
gen_hsa_insns_for_direct_call (stmt, hbb);
-- 
2.10.0



[hsa-branch 8/9] Fail instead of calling an unknown GOMP builtin

2016-10-10 Thread Martin Jambor
Hi,

this patch is a bit of a hack to make sure we do not emit calls to
libgomp run-time functions which are not available at the HSA GPU side,
such as run-time loop scheduling routines.  If we fail at the caller
side, we avoid issues with finalizer looking at calls to non-existing
functions.

Committed to the branch, queued for merge to trunk soon.
Thanks,

Martin

2016-10-03  Martin Jambor  

* hsa-gen.c (gen_hsa_insns_for_call): Fail when encountering a
GOMP builtin that we cannot process ourselves.
---
 gcc/hsa-gen.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index 8893a28..fd0dbcd 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -5972,7 +5972,15 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
   break;
 default:
   {
-   gen_hsa_insns_for_direct_call (stmt, hbb);
+   tree name_tree = DECL_NAME (fndecl);
+   const char *s = IDENTIFIER_POINTER (name_tree);
+   size_t len = strlen (s);
+   if (len > 4 && (strncmp (s, "__builtin_GOMP_", 15) == 0))
+ HSA_SORRY_ATV (gimple_location (stmt),
+"support for HSA does not implement GOMP function %s",
+s);
+   else
+ gen_hsa_insns_for_direct_call (stmt, hbb);
return;
   }
 }
-- 
2.10.0



[hsa-branch 9/9] Fix another finalizer type complaint

2016-10-10 Thread Martin Jambor
Hi,

the subsequent patch deals with a finalizer error issued when we ave a
register-register move of an HSAIL vector type.  Apparently, such a move
must obey the same rules as vector loads and stores.

Committed to the branch, queued for merge to trunk soon.
Thanks,

Martin

2016-10-03  Martin Jambor  

* hsa-gen.c (hsa_build_append_simple_mov): Use mem_type_for_type.
---
 gcc/hsa-gen.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index fd0dbcd..0b25f66 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -2227,8 +2227,10 @@ hsa_reg_or_immed_for_gimple_op (tree op, hsa_bb *hbb)
 void
 hsa_build_append_simple_mov (hsa_op_reg *dest, hsa_op_base *src, hsa_bb *hbb)
 {
-  hsa_insn_basic *insn = new hsa_insn_basic (2, BRIG_OPCODE_MOV, dest->m_type,
-dest, src);
+  /* Moves of packed data between registers need to adhere to the same type
+ rules like when dealing with memory.  */
+  BrigType16_t tp = mem_type_for_type (dest->m_type);
+  hsa_insn_basic *insn = new hsa_insn_basic (2, BRIG_OPCODE_MOV, tp, dest, 
src);
   if (hsa_op_reg *sreg = dyn_cast  (src))
 gcc_assert (hsa_type_bit_size (dest->m_type)
== hsa_type_bit_size (sreg->m_type));
-- 
2.10.0


Re: [hsa] depend nowait support for target

2015-11-25 Thread Martin Jambor
On Mon, Nov 23, 2015 at 03:16:42PM +0100, Jakub Jelinek wrote:
> On Mon, Nov 23, 2015 at 03:12:05PM +0100, Martin Jambor wrote:
> > +/* Thread routine to run a kernel asynchronously.  */
> > +
> > +static void *
> > +run_kernel_asynchronously (void *thread_arg)
> > +{
> > +  struct async_run_info *info = (struct async_run_info *) thread_arg;
> > +  int device = info->device;
> > +  void *tgt_fn = info->tgt_fn;
> > +  void *tgt_vars = info->tgt_vars;
> > +  void **args = info->args;
> > +  void *async_data = info->async_data;
> > +
> > +  free (info);
> > +  GOMP_OFFLOAD_run (device, tgt_fn, tgt_vars, args);
> > +  GOMP_PLUGIN_target_task_completion (async_data);
> > +  return NULL;
> 
> Is this just a temporary hack to work-around the missing task.c/target.c
> support for plugins that need polling (calling some hook) to determine
> completion of the tasks, or there is no way to tell HSA to spawn something
> asynchronously?
> Short term it is ok this way.

Basically yes.  There is no way to tell HSA-run time to be notified of
kernel completion.  If libgomp provides a way to poll the device, I'll
gladly use that instead.

> 
> > +  int err = pthread_create (&pt, NULL, &run_kernel_asynchronously, info);
> > +  if (err != 0)
> > +GOMP_PLUGIN_fatal ("HSA asynchronous thread creation failed: %s",
> > +  strerror (err));
> > +  err = pthread_detach (pt);
> > +  if (err != 0)
> > +GOMP_PLUGIN_fatal ("Failed to detach a thread to run HRA kernel "
> > +  "asynchronously: %s", strerror (err));
> 
> HSA instead of HRA?
> 

Oh, thanks.  Will fix.

Martin


[hsa] omp_target_associate_ptr and omp_target_is_present on shared memory

2015-11-25 Thread Martin Jambor
Hi,

when looking at why target-12.c and target-24.c in
libgomp/testsuite/libgomp.c/, I found two other places in libgomp's
target.c where shared-memory devices ought to be treated like the
host.  Committed to the branch.

Thanks,

Martin


2015-11-25  Martin Jambor  

libgomp/
* target.c (omp_target_associate_ptr): Return EINVAL for shared
memory devices.
(omp_target_is_present): Return 1 for shared memory
devices.
---
 libgomp/target.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/libgomp/target.c b/libgomp/target.c
index f8a9803..b453c0c 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1922,7 +1922,8 @@ omp_target_is_present (void *ptr, int device_num)
   if (devicep == NULL)
 return 0;
 
-  if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
+  if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
+  || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
 return 1;
 
   gomp_mutex_lock (&devicep->lock);
@@ -2146,7 +2147,8 @@ omp_target_associate_ptr (void *host_ptr, void 
*device_ptr, size_t size,
   if (devicep == NULL)
 return EINVAL;
 
-  if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
+  if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
+  || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
 return EINVAL;
 
   gomp_mutex_lock (&devicep->lock);
-- 
2.6.0



[hsa] Fix static local variable name conflict

2015-11-25 Thread Martin Jambor
Hi,

the patch below makes libgomp/testsuite/libgomp.c/target-28.c pass on
HSA, where it previously did not like the two static variables with
the same name.  Committed to the branch. 

Thanks,

Martin


2015-11-25  Martin Jambor  

* hsa.c (hsa_get_declaration_name): Return ASM name for global
variables.

---
 gcc/hsa.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/hsa.c b/gcc/hsa.c
index 7c9e0f6..8ab5da7 100644
--- a/gcc/hsa.c
+++ b/gcc/hsa.c
@@ -700,6 +700,8 @@ hsa_get_declaration_name (tree decl)
 }
   else if (TREE_CODE (decl) == FUNCTION_DECL)
 return cgraph_node::get_create (decl)->asm_name ();
+  else if (TREE_CODE (decl) == VAR_DECL && is_global_var (decl))
+return IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
   else
 return IDENTIFIER_POINTER (DECL_NAME (decl));
 
-- 
2.6.0



[hsa] Describe grid with target clauses

2015-11-30 Thread Martin Jambor
Hi,

Jakub requested that I remove the grid description from new fields of
the classes representing gimple omp statement and put them into
special artificial clauses instead.  This patch implement that, with
one target clause per dimension (so up to three clauses) and each one
describing both the grid size and group size along that dimension
(hence the new clause type has two parameters).

Committed to the branch, I will be preparing a new diff against the
trunk shortly.

Thanks,

Martin


2015-11-30  Martin Jambor  

* gimple.c (gimple_omp_target_init_dimensions): Removed.
* gimple.h (gimple_statement_omp_parallel_layout): Removed fields
dimensions and kernel_dim.
(gimple_omp_target_dimensions): Removed.
(gimple_omp_target_grid_size): Likewise.
(gimple_omp_target_grid_size_ptr): Likewise.
(gimple_omp_target_set_grid_size): Likewise.
(gimple_omp_target_workgroup_size): Likewise.
(gimple_omp_target_workgroup_size_ptr): Likewise.
(gimple_omp_target_set_workgroup_size): Likewise.
* omp-low.c (scan_sharing_clauses): Handle OMP_CLAUSE__GRIDDIM_.
(scan_omp_target): Do not scan kernel_dim.
(region_needs_kernel_p): Use clauses to recognize gridified kernels.
(get_kernel_launch_attributes): Generate launch attributes from
clauses.
(get_target_arguments): Use clauses to recognize gridified kernels.
(expand_target_kernel_body): Likewise.
(attempt_target_gridification): Record grid description into clauses.
* tree-core.h (omp_clause_code): New element OMP_CLAUSE__GRIDDIM_.
(tree_omp_clause): New subcode dimension.
* tree-pretty-print.c (dump_omp_clause): Handle OMP_CLAUSE__GRIDDIM_.
* tree.c (omp_clause_num_ops): Add number of opernads of
OMP_CLAUSE__GRIDDIM_.
(omp_clause_code_name): Add name of OMP_CLAUSE__GRIDDIM_.
(walk_tree_1): Handle OMP_CLAUSE__GRIDDIM_.
* tree.h (OMP_CLAUSE_GRIDDIM_DIMENSION): New.
(OMP_CLAUSE_SET_GRIDDIM_DIMENSION): Likewise.
(OMP_CLAUSE_GRIDDIM_SIZE): Likewise.
(OMP_CLAUSE_GRIDDIM_GROUP): Likewise.
---
 gcc/gimple.c| 11 ---
 gcc/gimple.h| 82 -
 gcc/omp-low.c   | 72 ++-
 gcc/tree-core.h |  9 +-
 gcc/tree-pretty-print.c | 12 
 gcc/tree.c  |  5 ++-
 gcc/tree.h  | 11 +++
 7 files changed, 79 insertions(+), 123 deletions(-)

diff --git a/gcc/gimple.c b/gcc/gimple.c
index d876e90..4658f29 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -1098,17 +1098,6 @@ gimple_build_omp_target (gimple_seq body, int kind, tree 
clauses)
   return p;
 }
 
-/* Set dimensions of TARGET to NUM and allocate kernel_dim array of the
-   statement with the appropriate number of elements.  */
-
-void
-gimple_omp_target_init_dimensions (gomp_target *target, size_t num)
-{
-  gcc_assert (num > 0);
-  target->dimensions = num;
-  target->kernel_dim = ggc_cleared_vec_alloc (num);
-}
-
 /* Build a GIMPLE_OMP_TEAMS statement.
 
BODY is the sequence of statements that will be executed.
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 14e6cf6..4c4c799 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -661,21 +661,7 @@ struct GTY((tag("GSS_OMP_PARALLEL_LAYOUT")))
  Shared data argument.  */
   tree data_arg;
 
-  /* TODO: Revisit placement of the following two fields.  On one hand, we
- currently only use them on target construct.  On the other, use on
- parallel construct is also possible in the future.  */
-
   /* [ WORD 11 ] */
-  /* Number of elements in kernel_iter array.  */
-  size_t dimensions;
-
-  /* [ WORD 12 ] */
-  /* If target also contains a GPU kernel, it should be run with the
- following grid sizes.  */
-  struct gimple_omp_target_grid_dim
-* GTY((length ("%h.dimensions"))) kernel_dim;
-
-  /* [ WORD 13 ] */
   /* If set, this statement is part of a gridified kernel, its clauses need to
  be scanned and lowered but the statement should be discarded after
  lowering.  */
@@ -1504,7 +1490,6 @@ gomp_sections *gimple_build_omp_sections (gimple_seq, 
tree);
 gimple *gimple_build_omp_sections_switch (void);
 gomp_single *gimple_build_omp_single (gimple_seq, tree);
 gomp_target *gimple_build_omp_target (gimple_seq, int, tree);
-void gimple_omp_target_init_dimensions (gomp_target *, size_t);
 gomp_teams *gimple_build_omp_teams (gimple_seq, tree);
 gomp_atomic_load *gimple_build_omp_atomic_load (tree, tree);
 gomp_atomic_store *gimple_build_omp_atomic_store (tree);
@@ -5683,73 +5668,6 @@ gimple_omp_target_set_data_arg (gomp_target 
*omp_target_stmt,
   omp_target_stmt->data_arg = data_arg;
 }
 
-/* Return the number of dimensions of kernel grid.  */
-
-static inline size_t
-gimple_omp_target_dimensions (gomp_target *omp_target_stmt)
-{
-  return omp_target_stmt->d

[hsa] Use proper accesses to gimple_omp_for

2015-11-30 Thread Martin Jambor
Hi,

when looking at the attempt_target_gridification function I realized I
forgot to to replace some of the early code with proper gimple
statement access function calls.  This patch addresses that.
Committed to the branch.

Thanks,

Martin


2015-11-30  Martin Jambor  

* omp-low.c (attempt_target_gridification): Use proper access into
iter array of the inner loop.
---
 gcc/omp-low.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 5933c60..bdf6539 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -17484,21 +17484,21 @@ attempt_target_gridification (gomp_target *target, 
gimple_stmt_iterator *gsi,
   size_t collapse = gimple_omp_for_collapse (inner_loop);
   for (size_t i = 0; i < collapse; i++)
 {
-  gimple_omp_for_iter iter = inner_loop->iter[i];
-  walk_tree (&iter.initial, remap_prebody_decls, &wi, NULL);
-  walk_tree (&iter.final, remap_prebody_decls, &wi, NULL);
-
-  tree itype, type = TREE_TYPE (iter.index);
+  tree itype, type = TREE_TYPE (gimple_omp_for_index (inner_loop, i));
   if (POINTER_TYPE_P (type))
itype = signed_type_for (type);
   else
itype = type;
 
-  enum tree_code cond_code = iter.cond;
-  tree n1 = iter.initial;
-  tree n2 = iter.final;
+  enum tree_code cond_code = gimple_omp_for_cond (inner_loop, i);
+  tree n1 = unshare_expr (gimple_omp_for_initial (inner_loop, i));
+  walk_tree (&n1, remap_prebody_decls, &wi, NULL);
+  tree n2 = unshare_expr (gimple_omp_for_final (inner_loop, i));
+  walk_tree (&n2, remap_prebody_decls, &wi, NULL);
   adjust_for_condition (loc, &cond_code, &n2);
-  tree step = get_omp_for_step_from_incr (loc, iter.incr);
+  tree step;
+  step = get_omp_for_step_from_incr (loc,
+gimple_omp_for_incr (inner_loop, i));
   n1 = force_gimple_operand_gsi (gsi, fold_convert (type, n1), true,
 NULL_TREE, true, GSI_SAME_STMT);
   n2 = force_gimple_operand_gsi (gsi, fold_convert (itype, n2), true,
-- 
2.6.0



[hsa] Use gimplify_expr in gridification

2015-11-30 Thread Martin Jambor
Hi,

doing some more testing of the branch and combining two of my
testcases I came accross a bug where temporaries created by
force_gimple_operand_gsi were not added to the proper bind and thus
were subsequently re-mapped to error_mark when the target construct
was within some other omp construct.  Fixed with this patch, where
pop_gimplify_context does the right thing like at other places in
omp-low.c.  Committed to the branch.

Thanks,

Martin



2015-11-30  Martin Jambor  

* omp-low.c (attempt_target_gridification): Use gimplify_expr.
---
 gcc/omp-low.c | 27 +++
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index bdf6539..7fbdcdf 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -17481,6 +17481,7 @@ attempt_target_gridification (gomp_target *target, 
gimple_stmt_iterator *gsi,
  gpukernel);
 
   walk_tree (&group_size, remap_prebody_decls, &wi, NULL);
+  push_gimplify_context ();
   size_t collapse = gimple_omp_for_collapse (inner_loop);
   for (size_t i = 0; i < collapse; i++)
 {
@@ -17499,30 +17500,32 @@ attempt_target_gridification (gomp_target *target, 
gimple_stmt_iterator *gsi,
   tree step;
   step = get_omp_for_step_from_incr (loc,
 gimple_omp_for_incr (inner_loop, i));
-  n1 = force_gimple_operand_gsi (gsi, fold_convert (type, n1), true,
-NULL_TREE, true, GSI_SAME_STMT);
-  n2 = force_gimple_operand_gsi (gsi, fold_convert (itype, n2), true,
-NULL_TREE,
-true, GSI_SAME_STMT);
+  gimple_seq tmpseq = NULL;
+  n1 = fold_convert (itype, n1);
+  n2 = fold_convert (itype, n2);
   tree t = build_int_cst (itype, (cond_code == LT_EXPR ? -1 : 1));
   t = fold_build2 (PLUS_EXPR, itype, step, t);
   t = fold_build2 (PLUS_EXPR, itype, t, n2);
-  t = fold_build2 (MINUS_EXPR, itype, t, fold_convert (itype, n1));
+  t = fold_build2 (MINUS_EXPR, itype, t, n1);
   if (TYPE_UNSIGNED (itype) && cond_code == GT_EXPR)
t = fold_build2 (TRUNC_DIV_EXPR, itype,
 fold_build1 (NEGATE_EXPR, itype, t),
 fold_build1 (NEGATE_EXPR, itype, step));
   else
t = fold_build2 (TRUNC_DIV_EXPR, itype, t, step);
-  t = fold_convert (uint32_type_node, t);
-  tree gs = force_gimple_operand_gsi (gsi, t, true, NULL_TREE, true,
- GSI_SAME_STMT);
+  tree gs = fold_convert (uint32_type_node, t);
+  gimplify_expr (&gs, &tmpseq, NULL, is_gimple_val, fb_rvalue);
+  if (!gimple_seq_empty_p (tmpseq))
+   gsi_insert_seq_before (gsi, tmpseq, GSI_SAME_STMT);
+
   tree ws;
   if (i == 0 && group_size)
{
  ws = fold_convert (uint32_type_node, group_size);
- ws = force_gimple_operand_gsi (gsi, ws, true, NULL_TREE, true,
-GSI_SAME_STMT);
+ tmpseq = NULL;
+ gimplify_expr (&ws, &tmpseq, NULL, is_gimple_val, fb_rvalue);
+ if (!gimple_seq_empty_p (tmpseq))
+   gsi_insert_seq_before (gsi, tmpseq, GSI_SAME_STMT);
}
   else
ws = build_zero_cst (uint32_type_node);
@@ -17534,7 +17537,7 @@ attempt_target_gridification (gomp_target *target, 
gimple_stmt_iterator *gsi,
   OMP_CLAUSE_CHAIN (c) = gimple_omp_target_clauses (target);
   gimple_omp_target_set_clauses (target, c);
 }
-
+  pop_gimplify_context (tgt_bind);
   delete declmap;
   return;
 }
-- 
2.6.0



[hsa] Useful checking assert in scan_omp_1_op

2015-12-03 Thread Martin Jambor
Hi,

I have found that adding the following checking assert very useful
when debugging omp lowering issues, so I have added it to the hsa
branch.  I hope that nobody will mind, but it of course is not an
essential thing to have if someone does.

Thanks,

Martin

2015-12-03  Martin Jambor  

* omp-low.c (scan_omp_1_op): Add checking assert that we are not
re-mapping to ERROR_MARK.
---
 gcc/omp-low.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 8854df7..05d8901 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -3731,7 +3731,11 @@ scan_omp_1_op (tree *tp, int *walk_subtrees, void *data)
 case LABEL_DECL:
 case RESULT_DECL:
   if (ctx)
-   *tp = remap_decl (t, &ctx->cb);
+   {
+ tree repl = remap_decl (t, &ctx->cb);
+ gcc_checking_assert (TREE_CODE (repl) != ERROR_MARK);
+ *tp = repl;
+   }
   break;
 
 default:
-- 
2.6.3



[hsa] Make copy_gimple_seq_and_replace_locals copy seqs in omp clauses

2015-12-03 Thread Martin Jambor
Hi,

this is a fix to the last "last" ICE of the hsa branch.  THe problem
turned out not to be in the gridification itself but, depending your
point of view, in the gimple and tree walking infrastructure or in
function copy_gimple_seq_and_replace_locals from tree-inline.c on
which hsa gridification relies.

The issue is that in between gimplification and omplow pass, there can
be gimple sequences attached to OMP_CLAUSE trees that are attached to
omp statements and that are neither copied by gimple_seq_copy nor
walked by walk_gimple_seq.

While the correct solution would probably be to extend tree and gimple
walkers to handle them, that would be a big change.  I have talked
with Jakub about this yesterday on the IRC and he suggested that I
enhance the internal walkers of copy_gimple_seq_and_replace_locals
deal with this situation.  Even though that leaves gimple_seq_copy,
walk_gimple_seq and other to be technically incorrect, that is what I
have done in the patch below, which fixes my last ICEs and which I
have already committed to the branch.

Any feedback is of course very much appreciated,

Martin


2015-12-03  Martin Jambor  

* tree-inline.c (duplicate_remap_omp_clause_seq): New function.
(replace_locals_op): Duplicate gimple sequences in OMP clauses.

---
 gcc/tree-inline.c | 43 +++
 1 file changed, 43 insertions(+)

diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index ebab189..15141dc 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -5116,6 +5116,8 @@ mark_local_labels_stmt (gimple_stmt_iterator *gsip,
   return NULL_TREE;
 }
 
+static gimple_seq duplicate_remap_omp_clause_seq (gimple_seq seq,
+ struct walk_stmt_info *wi);
 
 /* Called via walk_gimple_seq by copy_gimple_seq_and_replace_local.
Using the splay_tree pointed to by ST (which is really a `splay_tree'),
@@ -5160,6 +5162,35 @@ replace_locals_op (tree *tp, int *walk_subtrees, void 
*data)
  TREE_OPERAND (expr, 3) = NULL_TREE;
}
 }
+  else if (TREE_CODE (expr) == OMP_CLAUSE)
+{
+  /* Before the omplower pass completes, some OMP clauses can contain
+sequences that are neither copied by gimple_seq_copy nor walked by
+walk_gimple_seq.  To make copy_gimple_seq_and_replace_locals work even
+in those situations, we have to copy and process them explicitely.  */
+
+  if (OMP_CLAUSE_CODE (expr) == OMP_CLAUSE_LASTPRIVATE)
+   {
+ gimple_seq seq = OMP_CLAUSE_LASTPRIVATE_GIMPLE_SEQ (expr);
+ seq = duplicate_remap_omp_clause_seq (seq, wi);
+ OMP_CLAUSE_LASTPRIVATE_GIMPLE_SEQ (expr) = seq;
+   }
+  else if (OMP_CLAUSE_CODE (expr) == OMP_CLAUSE_LINEAR)
+   {
+ gimple_seq seq = OMP_CLAUSE_LINEAR_GIMPLE_SEQ (expr);
+ seq = duplicate_remap_omp_clause_seq (seq, wi);
+ OMP_CLAUSE_LINEAR_GIMPLE_SEQ (expr) = seq;
+   }
+  else if (OMP_CLAUSE_CODE (expr) == OMP_CLAUSE_REDUCTION)
+   {
+ gimple_seq seq = OMP_CLAUSE_REDUCTION_GIMPLE_INIT (expr);
+ seq = duplicate_remap_omp_clause_seq (seq, wi);
+ OMP_CLAUSE_REDUCTION_GIMPLE_INIT (expr) = seq;
+ seq = OMP_CLAUSE_REDUCTION_GIMPLE_MERGE (expr);
+ seq = duplicate_remap_omp_clause_seq (seq, wi);
+ OMP_CLAUSE_REDUCTION_GIMPLE_MERGE (expr) = seq;
+   }
+}
 
   /* Keep iterating.  */
   return NULL_TREE;
@@ -5200,6 +5231,18 @@ replace_locals_stmt (gimple_stmt_iterator *gsip,
   return NULL_TREE;
 }
 
+/* Create a copy of SEQ and remap all decls in it.  */
+
+static gimple_seq
+duplicate_remap_omp_clause_seq (gimple_seq seq, struct walk_stmt_info *wi)
+{
+  /* If there are any labels in OMP sequences, they can be only referred to in
+ the sequence itself and therefore we can do both here.  */
+  walk_gimple_seq (seq, mark_local_labels_stmt, NULL, wi);
+  gimple_seq copy = gimple_seq_copy (seq);
+  walk_gimple_seq (copy, replace_locals_stmt, replace_locals_op, wi);
+  return copy;
+}
 
 /* Copies everything in SEQ and replaces variables and labels local to
current_function_decl.  */
-- 
2.6.3



[hsa 0/10] Merge of HSA branch

2015-12-07 Thread Martin Jambor
Hi,

I'm sorry it took me more than a month to come up with another round
of patches aiming at merging the HSA branch into the trunk.  Keeping
up-to date with the latest changes in the OpenMP 4.5 area was
strenuous and we have discovered and fixed a few bugs as I intensified
my testing efforts.

While those are the main areas where this patch set differs from the
previous one, I have of course addressed the feedback I got the last
time, including implementing device-specific OpenMP target arguments,
moving kernel grid size from gimple class fields to new artificial
clauses and disabling the vectorizer for HSA functions using
DECL_FUNCTION_SPECIFIC_OPTIMIZATION rather than extra code in
respective pass gates.

Because I have not been able to come up with any solution to failing
libgomp/testsuite/libgomp.c++/target-2.C, I have disabled use of
dynamic parallelism in this merge (I keep it on the branch) and
therefore entirely rely on the gridification process to run loops on
the accelerator, because gridified constructs do not have this issue
(passing private symbols by reference).

HSA tests are still missing, I would need some guidance as to how to
best implement them (specially to test gridification which of course
does not happen for other accelerators).  There are no failing
testcases if HSA is not configured.  If it is, there are some, all of
which fall into one the following categories:

  1) HSA cannot compile a function for one reason or another (most
 common cause is inability of HSA to take an address of a function
 or make an indirect call) and gives a warning, which is regarded
 as an "excess error" by dejagnu.

  2) When HSA is not emitted for a function, libgomp runs a host
 fallback instead of it.  When the test queries
 omp_is_initial_device and asserts it returns false, the test
 fails.

  3) There are still a few failing OpenACC tests, but those just
 should not be run.

Of course, the patch set bootstraps fine on x86_64-linux with or
without configured HSA.

Any feedback is welcome.  Thanks,

Martin


[hsa 1/10] Configury changes and new options

2015-12-07 Thread Martin Jambor
Hi,

this patch contains changes to the configuration mechanism and offload
bits, so that users can build compilers with HSA support. It plays
nicely with other accelerators despite using an altogether different
implementation approach.  I have also added to it definitions of the
new options and parameters, since at least one hunk in common.opt is
highly related.  -fdisable-hsa-gridification has disappeared, othrwise
very little has changed since the last submission.

With this patch, the user can request HSA support by including the
string "hsa" among the requested accelerators in
--enable-offload-targets.  This will cause the compiler to start
producing HSAIL for target OpenMP constructs/functions and the hsa
libgomp plugin to be built.  Because the plugin needs to use HSA
run-time library, I have introduced options --with-hsa-runtime (and
more precise --with-hsa-include and --with-hsa-lib) to help find it.
The open-sourced hsa runtime available at github is binary compatible
with the closed-source one which however also contains the finalizer
and so needs to be used for all practical purposes.  I am regularly
asking AMD to keep their promise and open source the finalizer too.

One catch is however that there is no offload compiler for HSA and so
the wrapper should not attempt to look for it (that is what the hunk
in lto-wrapper.c does) and when HSA is the only accelerator, it is
wasteful to output LTO sections with byte-code and therefore if HSA is
the only configured accelerator, it does not set ENABLE_OFFLOADING
macro.

Finally, when the compiler has been configured for HSA but the user
disables it by omitting it in the -foffload compiler option, we need
to observe that decision.  That is what the opts.c hunk does.

As far as the options are concerned, the patch adds new warning -Whsa
we emit whenever we fail to produce HSAIL for some source code.  It is
on by default but warnigs are of course only emitted by HSAIL
generating code so will never affect anybody who does not use both an
HSA-enabled compiler and OpenMP 4 device constructs.

Then there is a new parameter hsa-gen-debug-stores, which will be
obsolete once HSA run-time supports debugging traps.  Before that, we
have to do with debugging stores to memory at defined places, which
however can cost speed in benchmarks.  So we only enabled them with
this parameter.  We decided to make it a parameter rather than a
switch to emphasize the fact it will go away and to possibly allow us
select different levels of verbosity of the stores in the future).

Any feedback is very appreciated,

Martin



2015-12-04  Martin Jambor  

gcc/
* Makefile.in (OBJS): Add new source files.
(GTFILES): Add hsa.c.
* config.in (ENABLE_HSA): New.
* configure.ac: Treat hsa differently from other accelerators.
(OFFLOAD_TARGETS): Define ENABLE_OFFLOADING according to
$enable_offloading.
(ENABLE_HSA): Define ENABLE_HSA according to $enable_hsa.
* doc/install.texi (Configuration): Document --with-hsa-runtime,
--with-hsa-runtime-include and --with-hsa-runtime-lib.
* lto-wrapper.c (compile_images_for_offload_targets): Do not attempt
to invoke offload compiler for hsa acclerator.
* opts.c (common_handle_option): Determine whether HSA offloading
should be performed.
* common.opt (disable_hsa): New variable.
(-Whsa): New warning.
* doc/invoke.texi (-Whsa): Document.
(hsa-gen-debug-stores): Likewise.
* params.def (PARAM_HSA_GEN_DEBUG_STORES): New parameter.

libgomp/plugin/
* Makefrag.am: Add HSA plugin requirements.
* configfrag.ac (HSA_RUNTIME_INCLUDE): New variable.
(HSA_RUNTIME_LIB): Likewise.
(HSA_RUNTIME_CPPFLAGS): Likewise.
(HSA_RUNTIME_INCLUDE): New substitution.
(HSA_RUNTIME_LIB): Likewise.
(HSA_RUNTIME_LDFLAGS): Likewise.
(hsa-runtime): New configure option.
(hsa-runtime-include): Likewise.
(hsa-runtime-lib): Likewise.
(PLUGIN_HSA): New substitution variable.
Fill HSA_RUNTIME_INCLUDE and HSA_RUNTIME_LIB according to the new
configure options.
(PLUGIN_HSA_CPPFLAGS): Likewise.
(PLUGIN_HSA_LDFLAGS): Likewise.
(PLUGIN_HSA_LIBS): Likewise.
Check that we have access to HSA run-time.

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index bee2879..5fe73a7 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1296,6 +1296,11 @@ OBJS = \
graphite-sese-to-poly.o \
gtype-desc.o \
haifa-sched.o \
+   hsa.o \
+   hsa-gen.o \
+   hsa-regalloc.o \
+   hsa-brig.o \
+   hsa-dump.o \
hw-doloop.o \
hwint.o \
ifcvt.o \
@@ -1320,6 +1325,7 @@ OBJS = \
ipa-icf.o \
ipa-icf-gimple.o \
ipa-reference.o \
+   ipa-hsa.o \
ipa-ref.o \
ipa-utils.o \
ipa.o \
@@ -2401,6 +2407,7 @@ GTFILES = $(CPP

[hsa 2/10] Modifications to libgomp proper

2015-12-07 Thread Martin Jambor
Hi,

The patch below contains all changes to libgomp files except for the
hsa plugin (which is in the following patch).

The changes can roughly divided into three categories.  First, it
contains changes I that are necessary to support shared-memory
devices.  In majority of cases this means treating them like the host
fallback because there is no need to copy, host malloc can be used for
allocating etc.  It also means that GOMP_target_ext and
gomp_target_task_fn should not be remapping arguments but should pass
to the plugin the same thing host fallback function would receive.

Second, because GCC HSA backend often does not emit HSAIL for function
it knows it cannot handle, these two functions need to gracefully
handle the case when there is no device implementation of a particular
function available by doing host fallback too.

Third, the patch implements libgomp-part of the device-specific
arguments passed to GOMP_target as requested Jakub (well, some are
actually for all devices but that is what we call them).  Because of
nowait target constructs, the arguments have proliferated into tasking
too, as did firstprivate copies.

Any feedback will be greatly appreciated,

Martin


2015-12-04  Martin Jambor  
Martin Liska  

include/
* gomp-constants.h (GOMP_DEVICE_HSA): New macro.
(GOMP_VERSION_HSA): Likewise.
(GOMP_TARGET_ARG_DEVICE_MASK): Likewise.
(GOMP_TARGET_ARG_DEVICE_ALL): Likewise.
(GOMP_TARGET_ARG_SUBSEQUENT_PARAM): Likewise.
(GOMP_TARGET_ARG_ID_MASK): Likewise.
(GOMP_TARGET_ARG_NUM_TEAMS): Likewise.
(GOMP_TARGET_ARG_THREAD_LIMIT): Likewise.
(GOMP_TARGET_ARG_VALUE_SHIFT): Likewise.
(GOMP_TARGET_ARG_HSA_KERNEL_ATTRIBUTES): Likewise.
(GOMP_kernel_launch_attributes): New type.
(GOMP_hsa_kernel_dispatch): New type.

libgomp/
* libgomp-plugin.h (offload_target_type): New element
OFFLOAD_TARGET_TYPE_HSA.
* libgomp.h (gomp_target_task): New field args.
(bool gomp_create_target_task): Updated.
(gomp_device_descr): Extra parameter of run_func and async_run_func,
new field can_run_func.
* libgomp_g.h (GOMP_target_ext): Change prototype.
* oacc-host.c (host_run): Added a new parameter args.
* target.c (gomp_target_fallback_firstprivate): New function.
(gomp_target_fallback_firstprivate): Use
gomp_target_fallback_firstprivate.
(gomp_get_target_fn_addr): Allow returning NULL for shared memory
devices.
(GOMP_target): Do host fallback for all shared memory devices.  Do not
pass any args to plugins.
(GOMP_target_ext): Add new parameter args.  Allow host fallback if
device shares memory.  Do not remap data if device has shared memory.
(gomp_target_task_fn): Likewise.  Also Treat shared memory devices
like host fallback for mappings.
(GOMP_target_data): Treat shared memory devices like host fallback.
(GOMP_target_data_ext): Likewise.
(GOMP_target_update): Likewise.
(GOMP_target_update_ext): Likewise.  Also pass NULL as args to
gomp_create_target_task.
(GOMP_target_enter_exit_data): Likewise.
(omp_target_alloc): Treat shared memory devices like host fallback.
(omp_target_free): Likewise.
(omp_target_is_present): Likewise.
(omp_target_memcpy): Likewise.
(omp_target_memcpy_rect): Likewise.
(omp_target_associate_ptr): Likewise.
(gomp_load_plugin_for_device): Also load can_run.
* task.c (GOMP_PLUGIN_target_task_completion): Free
firstprivate_copies.
(gomp_create_target_task): Accept new argument args and store it to
ttask.

liboffloadmic/plugin
* libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_async_run): New unused
parameter.
(GOMP_OFFLOAD_run): Likewise.

diff --git a/include/gomp-constants.h b/include/gomp-constants.h
index dffd631..1dae474 100644
--- a/include/gomp-constants.h
+++ b/include/gomp-constants.h
@@ -176,6 +176,7 @@ enum gomp_map_kind
 #define GOMP_DEVICE_NOT_HOST   4
 #define GOMP_DEVICE_NVIDIA_PTX 5
 #define GOMP_DEVICE_INTEL_MIC  6
+#define GOMP_DEVICE_HSA7
 
 #define GOMP_DEVICE_ICV-1
 #define GOMP_DEVICE_HOST_FALLBACK  -2
@@ -201,6 +202,7 @@ enum gomp_map_kind
 #define GOMP_VERSION   0
 #define GOMP_VERSION_NVIDIA_PTX 1
 #define GOMP_VERSION_INTEL_MIC 0
+#define GOMP_VERSION_HSA 0
 
 #define GOMP_VERSION_PACK(LIB, DEV) (((LIB) << 16) | (DEV))
 #define GOMP_VERSION_LIB(PACK) (((PACK) >> 16) & 0x)
@@ -228,4 +230,74 @@ enum gomp_map_kind
 #define GOMP_LAUNCH_OP(X) (((X) >> GOMP_LAUNCH_OP_SHIFT) & 0x)
 #define GOMP_LAUNCH_OP_MAX 0x
 
+/* Bitmask to apply in order to find out the intended device of a target
+   argument.  */
+#define GOMP_TARGET_ARG_DEVICE_MASK((1 << 7)

[hsa 3/10] HSA libgomp plugin

2015-12-07 Thread Martin Jambor
Hi,

the patch below adds the HSA-specific plugin for libgomp.  The plugin
implements the interface mandated by libgomp and takes care of finding
any available HSA devices, finalizing HSAIL code and running it on
HSA-capable GPUs.  The plugin does not really implement any data
movement functions (it implements them with a fatal error call)
because memory is shared in HSA environments and the previous patch
has modified libgomp proper not to call those functions on devices
with this capability.

The changes since the last submission include version checks,
receiving grid sizes through a device-specific parameter and support
for asynchronous execution.

Any feedback will be greatly appreciated,

Martin


2015-12-04  Martin Jambor  
Martin Liska  

* plugin/plugin-hsa.c: New file.

diff --git a/libgomp/plugin/plugin-hsa.c b/libgomp/plugin/plugin-hsa.c
new file mode 100644
index 000..b132954
--- /dev/null
+++ b/libgomp/plugin/plugin-hsa.c
@@ -0,0 +1,1449 @@
+/* Plugin for HSAIL execution.
+
+   Copyright (C) 2013-2015 Free Software Foundation, Inc.
+
+   Contributed by Martin Jambor  and
+   Martin Liska .
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include 
+#include 
+#include 
+#include 
+#include "libgomp-plugin.h"
+#include "gomp-constants.h"
+#include "hsa.h"
+#include "hsa_ext_finalize.h"
+#include "dlfcn.h"
+
+/* Part of the libgomp plugin interface.  Return the name of the accelerator,
+   which is "hsa".  */
+
+const char *
+GOMP_OFFLOAD_get_name (void)
+{
+  return "hsa";
+}
+
+/* Part of the libgomp plugin interface.  Return the specific capabilities the
+   HSA accelerator have.  */
+
+unsigned int
+GOMP_OFFLOAD_get_caps (void)
+{
+  return GOMP_OFFLOAD_CAP_SHARED_MEM | GOMP_OFFLOAD_CAP_OPENMP_400;
+}
+
+/* Part of the libgomp plugin interface.  Identify as HSA accelerator.  */
+
+int
+GOMP_OFFLOAD_get_type (void)
+{
+  return OFFLOAD_TARGET_TYPE_HSA;
+}
+
+/* Return the libgomp version number we're compatible with.  There is
+   no requirement for cross-version compatibility.  */
+
+unsigned
+GOMP_OFFLOAD_version (void)
+{
+  return GOMP_VERSION;
+}
+
+/* Flag to decide whether print to stderr information about what is going on.
+   Set in init_debug depending on environment variables.  */
+
+static bool debug;
+
+/* Flag to decide if the runtime should suppress a possible fallback to host
+   execution.  */
+
+static bool suppress_host_fallback;
+
+/* Initialize debug and suppress_host_fallback according to the environment.  
*/
+
+static void
+init_enviroment_variables (void)
+{
+  if (getenv ("HSA_DEBUG"))
+debug = true;
+  else
+debug = false;
+
+  if (getenv ("HSA_SUPPRESS_HOST_FALLBACK"))
+suppress_host_fallback = true;
+  else
+suppress_host_fallback = false;
+}
+
+/* Print a logging message with PREFIX to stderr if HSA_DEBUG value
+   is set to true.  */
+
+#define HSA_LOG(prefix, ...) \
+  do \
+  { \
+if (debug) \
+  { \
+   fprintf (stderr, prefix); \
+   fprintf (stderr, __VA_ARGS__); \
+  } \
+  } \
+  while (false);
+
+/* Print a debugging message to stderr.  */
+
+#define HSA_DEBUG(...) HSA_LOG ("HSA debug: ", __VA_ARGS__)
+
+/* Print a warning message to stderr.  */
+
+#define HSA_WARNING(...) HSA_LOG ("HSA warning: ", __VA_ARGS__)
+
+/* Print HSA warning STR with an HSA STATUS code.  */
+
+static void
+hsa_warn (const char *str, hsa_status_t status)
+{
+  if (!debug)
+return;
+
+  const char* hsa_error;
+  hsa_status_string (status, &hsa_error);
+
+  unsigned l = strlen (hsa_error);
+
+  char *err = GOMP_PLUGIN_malloc (sizeof (char) * l);
+  memcpy (err, hsa_error, l - 1);
+  err[l] = '\0';
+
+  fprintf (stderr, "HSA warning: %s (%s)\n", str, err);
+
+  free (err);
+}
+
+/* Report a fatal error STR together with the HSA error corresponding to STATUS
+   and terminate execution of the current pr

[hsa 4/10] Merge of HSA branch

2015-12-07 Thread Martin Jambor
Subject: Make copy_gimple_seq_and_replace_locals copy seqs in omp clauses

Hi,

this is https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00477.html with
the early return requested by Jakub.  Please refer to that previous
email for explanation why it is necessary.

Thanks,

2015-12-03  Martin Jambor  

* tree-inline.c (duplicate_remap_omp_clause_seq): New function.
(replace_locals_op): Duplicate gimple sequences in OMP clauses.

diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index ebab189..dea23c7 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -5116,6 +5116,8 @@ mark_local_labels_stmt (gimple_stmt_iterator *gsip,
   return NULL_TREE;
 }
 
+static gimple_seq duplicate_remap_omp_clause_seq (gimple_seq seq,
+ struct walk_stmt_info *wi);
 
 /* Called via walk_gimple_seq by copy_gimple_seq_and_replace_local.
Using the splay_tree pointed to by ST (which is really a `splay_tree'),
@@ -5160,6 +5162,35 @@ replace_locals_op (tree *tp, int *walk_subtrees, void 
*data)
  TREE_OPERAND (expr, 3) = NULL_TREE;
}
 }
+  else if (TREE_CODE (expr) == OMP_CLAUSE)
+{
+  /* Before the omplower pass completes, some OMP clauses can contain
+sequences that are neither copied by gimple_seq_copy nor walked by
+walk_gimple_seq.  To make copy_gimple_seq_and_replace_locals work even
+in those situations, we have to copy and process them explicitely.  */
+
+  if (OMP_CLAUSE_CODE (expr) == OMP_CLAUSE_LASTPRIVATE)
+   {
+ gimple_seq seq = OMP_CLAUSE_LASTPRIVATE_GIMPLE_SEQ (expr);
+ seq = duplicate_remap_omp_clause_seq (seq, wi);
+ OMP_CLAUSE_LASTPRIVATE_GIMPLE_SEQ (expr) = seq;
+   }
+  else if (OMP_CLAUSE_CODE (expr) == OMP_CLAUSE_LINEAR)
+   {
+ gimple_seq seq = OMP_CLAUSE_LINEAR_GIMPLE_SEQ (expr);
+ seq = duplicate_remap_omp_clause_seq (seq, wi);
+ OMP_CLAUSE_LINEAR_GIMPLE_SEQ (expr) = seq;
+   }
+  else if (OMP_CLAUSE_CODE (expr) == OMP_CLAUSE_REDUCTION)
+   {
+ gimple_seq seq = OMP_CLAUSE_REDUCTION_GIMPLE_INIT (expr);
+ seq = duplicate_remap_omp_clause_seq (seq, wi);
+ OMP_CLAUSE_REDUCTION_GIMPLE_INIT (expr) = seq;
+ seq = OMP_CLAUSE_REDUCTION_GIMPLE_MERGE (expr);
+ seq = duplicate_remap_omp_clause_seq (seq, wi);
+ OMP_CLAUSE_REDUCTION_GIMPLE_MERGE (expr) = seq;
+   }
+}
 
   /* Keep iterating.  */
   return NULL_TREE;
@@ -5200,6 +5231,21 @@ replace_locals_stmt (gimple_stmt_iterator *gsip,
   return NULL_TREE;
 }
 
+/* Create a copy of SEQ and remap all decls in it.  */
+
+static gimple_seq
+duplicate_remap_omp_clause_seq (gimple_seq seq, struct walk_stmt_info *wi)
+{
+  if (!seq)
+return NULL;
+
+  /* If there are any labels in OMP sequences, they can be only referred to in
+ the sequence itself and therefore we can do both here.  */
+  walk_gimple_seq (seq, mark_local_labels_stmt, NULL, wi);
+  gimple_seq copy = gimple_seq_copy (seq);
+  walk_gimple_seq (copy, replace_locals_stmt, replace_locals_op, wi);
+  return copy;
+}
 
 /* Copies everything in SEQ and replaces variables and labels local to
current_function_decl.  */


[hsa 5/10] OpenMP lowering/expansion changes (gridification)

2015-12-07 Thread Martin Jambor
Hi,

the patch in this email contains the changes to make our OpenMP
lowering and expansion machinery produce GPU kernels for a certain
limited class of loops.  The plan is to make that class quite a big
bigger, but only the following is ready for submission now.

Basically, whenever the compiler configured for HSAIL generation
encounters the following pattern:

  #pragma omp target
  #pragma omp teams thread_limit(workgroup_size) // thread_limit is optional
  #pragma omp distribute parallel for firstprivate(n,j) private(i) 
other_sharing_clauses()
for (i = j + 1; i < n; i += 3)
  some_loop_body


it creates a copy of the entire target body and expands it slightly
differently for concurrent execution on a GPU.  Note that both teams
and distribute constructs are mandatory.  Moreover, currently the
distribute has to be in a combined statement with the inner for
construct.  And there are quite a few other restrictions which I hope
to alleviate over the next year, most notably reductions and collapse
clause now prevent gridification (see the new function
target_follows_gridifiable_pattern to find out what exactly the
restrictions are).

The first phase of the "gridification" process is run before omp
"scanning" phase.  We look for the pattern above, and if we encounter
one, we copy its entire body into a new gimple statement
GIMPLE_OMP_GPUKERNEL.  Within it, we mark the teams, distribute and
parallel constructs with a new flag "kernel_phony."  This flag will
then make OMP lowering phase process their sharing clauses like usual,
but the statements representing the constructs will be removed at
lowering (and thus will never be expanded).  The resulting wasteful
repackaging of data is nicely cleaned by our optimizers even at -O1.

At expansion time, we identify gomp_target statements with a kernel
and expand the kernel into a special function, with the loop
represented by the GPU grid and not control flow.  Afterwards, the
normal body of the target is expanded as usual.  Finally, we need to
take the grid dimensions stored within new fields of the target
statement by the first phase, store in a structure and pass them in a
device-specific argument to GOMP_target_ext.

The patch thus also implements the compiler part of device-specific
target arguments as discussed on the mailing list an IRC.

Originally, when I started with the above pattern matching, I did not
allow any other gimple statements in between the respective omp
constructs.  That however proved to be too restrictive for two
reasons.  First, statements in pre-bodies of both distribute and for
loops needed to be accounted for when calculating the kernel grid size
(which is done before the target statement itself) and second, Fortran
parameter dereferences happily result in interleaving statements when
there were none in the user source code.

Therefore, I allow register-type stores to local non-addressable
variables in pre-bodies and also in between the OMP constructs.  All
of them are copied in front of the target statement and either used
for grid size calculation or removed as useless by later
optimizations.

I hope that eventually I managed to write the gridification in a way
that interferes very little with the rest of the OMP pipeline and yet
only re-implement the bare necessary minimum of functionality that is
already there.  Any feedback is of course still very welcome.

Thanks,

Martin


2015-12-04  Martin Jambor  

* builtin-types.def (BT_FN_VOID_UINT_PTR_INT_PTR): New.
(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT): Removed.
(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR): New.
* fortran/types.def (BT_FN_VOID_UINT_PTR_INT_PTR): New.
(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT): Removed.
(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR): New.
* gimple-low.c (lower_stmt): Also handle GIMPLE_OMP_GPUKERNEL.
* gimple-pretty-print.c (dump_gimple_omp_for): Also handle
GF_OMP_FOR_KIND_KERNEL_BODY.
(dump_gimple_omp_block): Also handle GIMPLE_OMP_GPUKERNEL.
(pp_gimple_stmt_1): Likewise.
* gimple-walk.c (walk_gimple_stmt): Likewise.
* gimple.c (gimple_build_omp_gpukernel): New function.
(gimple_copy): Also handle GIMPLE_OMP_GPUKERNEL.
* gimple.def (GIMPLE_OMP_TEAMS): Moved into its own layout.
(GIMPLE_OMP_GPUKERNEL): New.
* gimple.h (gf_mask): Added GF_OMP_FOR_KIND_KERNEL_BODY.
(gomp_for): New field kernel_phony.
(gimple_statement_omp_parallel_layout): Likewise.
(gimple_statement_omp_single_layout): Updated comments.
(gomp_teams): New field kernel_phony.
(gimple_build_omp_gpukernel): Declare.
(gimple_has_substatements): Also handle GIMPLE_OMP_GPUKERNEL.
(gimple_omp_for_kernel_phony): New.
(gimple_omp_for_set_kernel_phony): Likewise.
(gimple_omp

[hsa 6/10] Pass manager changes

2015-12-07 Thread Martin Jambor
Hi,

the pass manager changes required for HSA have already been committed
to trunk so all that remains are these additions to the pass pipeline.

Thanks,

Martin


2015-12-04  Martin Jambor  
Martin Liska  

* passes.def: Schedule pass_ipa_hsa and pass_gen_hsail.
* tree-pass.h (make_pass_gen_hsail): Declare.
(make_pass_ipa_hsa): Likewise.


diff --git a/gcc/passes.def b/gcc/passes.def
index 28cb4c1..0f0f36d 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -144,6 +144,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_ipa_cp);
   NEXT_PASS (pass_ipa_cdtor_merge);
   NEXT_PASS (pass_target_clone);
+  NEXT_PASS (pass_ipa_hsa);
   NEXT_PASS (pass_ipa_inline);
   NEXT_PASS (pass_ipa_pure_const);
   NEXT_PASS (pass_ipa_reference);
@@ -377,6 +378,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_nrv);
   NEXT_PASS (pass_cleanup_cfg_post_optimizing);
   NEXT_PASS (pass_warn_function_noreturn);
+  NEXT_PASS (pass_gen_hsail);
 
   NEXT_PASS (pass_expand);
 
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 9704918..30127d4 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -467,6 +467,7 @@ extern gimple_opt_pass *make_pass_ubsan (gcc::context 
*ctxt);
 extern gimple_opt_pass *make_pass_sanopt (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_oacc_kernels (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_oacc_kernels2 (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_gen_hsail (gcc::context *ctxt);
 
 /* IPA Passes */
 extern simple_ipa_opt_pass *make_pass_ipa_lower_emutls (gcc::context *ctxt);
@@ -491,6 +492,7 @@ extern ipa_opt_pass_d *make_pass_ipa_cp (gcc::context 
*ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_icf (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_devirt (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_reference (gcc::context *ctxt);
+extern ipa_opt_pass_d *make_pass_ipa_hsa (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_pure_const (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_pta (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_tm (gcc::context *ctxt);


[hsa 7/10] IPA-HSA pass

2015-12-07 Thread Martin Jambor
Hi,

when a target construct is gridified, the HSA GPU function is
associated with the CPU function throughout the compilation, so that
they can be registered as a pair in libgomp.

Ungridified target constructs and, more importantly, "pragma omp
declare target" marked functions emerge out of OMP expansion as one
gimple function for both the host and the accelerator. However, at
some point we need to create a special HSA function representation so
that we can modify behavior of a (very) few optimization passes for
them.

Both is done by the following new IPA pass, which creates new HSA
clones in these cases.  Moreover, it redirects the appropriate call
graph edges to be in between HSA implementations, marks HSA clones
with the flatten attribute to minimize any call overhead (which is
much more significant on GPUs) and makes sure both the CPU and GPU
functions are coupled together and remain in the same LTO partition so
that they can b registered together to libgomp.

Thanks,

Martin

2015-12-04  Martin Liska  
        Martin Jambor  

* ipa-hsa.c: New file.
* lto-section-in.c (lto_section_name): Add hsa section name.
* lto-streamer.h (lto_section_type): Add hsa section.
* lto-partition.c: Include "hsa.h"
(add_symbol_to_partition_1): Put hsa implementations int the
same partition as host implementations.
* timevar.def (TV_IPA_HSA): New.

diff --git a/gcc/ipa-hsa.c b/gcc/ipa-hsa.c
new file mode 100644
index 000..5b3e563
--- /dev/null
+++ b/gcc/ipa-hsa.c
@@ -0,0 +1,329 @@
+/* Callgraph based analysis of static variables.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+   Contributed by Martin Liska 
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* Interprocedural HSA pass is responsible for creation of HSA clones.
+   For all these HSA clones, we emit HSAIL instructions and pass processing
+   is terminated.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "is-a.h"
+#include "hash-set.h"
+#include "vec.h"
+#include "tree.h"
+#include "tree-pass.h"
+#include "function.h"
+#include "basic-block.h"
+#include "gimple.h"
+#include "dumpfile.h"
+#include "gimple-pretty-print.h"
+#include "tree-streamer.h"
+#include "stringpool.h"
+#include "cgraph.h"
+#include "print-tree.h"
+#include "symbol-summary.h"
+#include "hsa.h"
+
+namespace {
+
+/* If NODE is not versionable, warn about not emiting HSAIL and return false.
+   Otherwise return true.  */
+
+static bool
+check_warn_node_versionable (cgraph_node *node)
+{
+  if (!node->local.versionable)
+{
+  warning_at (EXPR_LOCATION (node->decl), OPT_Whsa,
+ "could not emit HSAIL for function %s: function cannot be "
+ "cloned", node->name ());
+  return false;
+}
+  return true;
+}
+
+/* The function creates HSA clones for all functions that were either
+   marked as HSA kernels or are callable HSA functions.  Apart from that,
+   we redirect all edges that come from an HSA clone and end in another
+   HSA clone to connect these two functions.  */
+
+static unsigned int
+process_hsa_functions (void)
+{
+  struct cgraph_node *node;
+
+  if (hsa_summaries == NULL)
+hsa_summaries = new hsa_summary_t (symtab);
+
+  FOR_EACH_DEFINED_FUNCTION (node)
+{
+  hsa_function_summary *s = hsa_summaries->get (node);
+
+  /* A linked function is skipped.  */
+  if (s->m_binded_function != NULL)
+   continue;
+
+  if (s->m_kind != HSA_NONE)
+   {
+ if (!check_warn_node_versionable (node))
+   continue;
+ cgraph_node *clone = node->create_virtual_clone
+   (vec  (), NULL, NULL, "hsa");
+ TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl);
+
+ clone->force_output = true;
+ hsa_summaries->link_functions (clone, node, s->m_kind, false);
+
+ if (dump_file)
+   fprintf (dump_file, "Created a new HSA clone: %s, type: %s\n",
+clone->name (),
+s->m_kind == HSA_KERNEL ? "kernel" : &qu

[hsa 8/10] HSAIL BRIG description header file (and a steering committee request)

2015-12-07 Thread Martin Jambor
Hi,

the following patch adds a BRIG (binary representation of HSAIL)
representation description.  It is within a single header file
describing the binary structures and constants of the format.

The file comes from the HSA Foundation (I have only added the
HSA_BRIG_FORMAT_H macro and check and removed some weird comments
which are not present in proposed future versions of the file) and is
licensed under "University of Illinois/NCSA Open Source License."

The license is "GPL-compatible" according to FSF
(http://www.gnu.org/licenses/license-list.en.html#GPLCompatibleLicenses)
so I believe we can have it in GCC.  Nevertheless, it is not GPL and
there is no copyright assignment for it, but the situation is
hopefully analogous to some other libraries that have their upstream
elsewhere but we ship them as part of the GCC.

I would therefore like to ask the GCC steering committee for
permission to add this file to GCC (and update it as HSA standard
evolves).  Please let me know if there is something more I need to do
in this regard.

Thanks,

Martin


2015-12-04  Martin Jambor  

* hsa-brig-format.h: New file.

diff --git a/gcc/hsa-brig-format.h b/gcc/hsa-brig-format.h
new file mode 100644
index 000..6e2fe75
--- /dev/null
+++ b/gcc/hsa-brig-format.h
@@ -0,0 +1,1277 @@
+// University of Illinois/NCSA
+// Open Source License
+//
+// Copyright (c) 2013-2015, Advanced Micro Devices, Inc.
+// All rights reserved.
+//
+// Developed by:
+//
+// HSA Team
+//
+// Advanced Micro Devices, Inc
+//
+// www.amd.com
+//
+// Permission is hereby granted, free of charge, to any person obtaining a 
copy of
+// this software and associated documentation files (the "Software"), to deal 
with
+// the Software without restriction, including without limitation the rights to
+// use, copy, modify, merge, publish, distribute, sublicense, and/or sell 
copies
+// of the Software, and to permit persons to whom the Software is furnished to 
do
+// so, subject to the following conditions:
+//
+// * Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimers.
+//
+// * Redistributions in binary form must reproduce the above copyright 
notice,
+//   this list of conditions and the following disclaimers in the
+//   documentation and/or other materials provided with the distribution.
+//
+// * Neither the names of the HSA Team, University of Illinois at
+//   Urbana-Champaign, nor the names of its contributors may be used to
+//   endorse or promote products derived from this Software without 
specific
+//   prior written permission.
+//
+// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 
FITNESS
+// FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
+// CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH 
THE
+// SOFTWARE.
+
+#ifndef HSA_BRIG_FORMAT_H
+#define HSA_BRIG_FORMAT_H
+
+typedef uint32_t BrigVersion32_t;
+
+enum BrigVersion {
+
+BRIG_VERSION_HSAIL_MAJOR = 1,
+BRIG_VERSION_HSAIL_MINOR = 0,
+BRIG_VERSION_BRIG_MAJOR  = 1,
+BRIG_VERSION_BRIG_MINOR  = 0
+};
+
+typedef uint8_t BrigAlignment8_t;
+
+typedef uint8_t BrigAllocation8_t;
+
+typedef uint8_t BrigAluModifier8_t;
+
+typedef uint8_t BrigAtomicOperation8_t;
+
+typedef uint32_t BrigCodeOffset32_t;
+
+typedef uint8_t BrigCompareOperation8_t;
+
+typedef uint16_t BrigControlDirective16_t;
+
+typedef uint32_t BrigDataOffset32_t;
+
+typedef BrigDataOffset32_t BrigDataOffsetCodeList32_t;
+
+typedef BrigDataOffset32_t BrigDataOffsetOperandList32_t;
+
+typedef BrigDataOffset32_t BrigDataOffsetString32_t;
+
+typedef uint8_t BrigExecutableModifier8_t;
+
+typedef uint8_t BrigImageChannelOrder8_t;
+
+typedef uint8_t BrigImageChannelType8_t;
+
+typedef uint8_t BrigImageGeometry8_t;
+
+typedef uint8_t BrigImageQuery8_t;
+
+typedef uint16_t BrigKind16_t;
+
+typedef uint8_t BrigLinkage8_t;
+
+typedef uint8_t BrigMachineModel8_t;
+
+typedef uint8_t BrigMemoryModifier8_t;
+
+typedef uint8_t BrigMemoryOrder8_t;
+
+typedef uint8_t BrigMemoryScope8_t;
+
+typedef uint16_t BrigOpcode16_t;
+
+typedef uint32_t BrigOperandOffset32_t;
+
+typedef uint8_t BrigPack8_t;
+
+typedef uint8_t BrigProfile8_t;
+
+typedef uint16_t BrigRegisterKind16_t;
+
+typedef uint8_t BrigRound8_t;
+
+typedef uint8_t BrigSamplerAddressing8_t;
+
+typedef uint8_t BrigSamplerCoordNormalization8_t;
+
+typedef uint8_t BrigSamplerFilter8_t;
+
+typedef uint8_t BrigSamplerQuery8_t;
+
+typedef uint32_t BrigSectionIndex32_t;
+
+typedef uint8_t BrigSegCvtModifier8_t;
+
+typedef uint8_t BrigSegment8_t;
+
+typedef uint32_t BrigStringOffset32_t;
+
+typedef u

[hsa 10/10] HSA register allocator

2015-12-07 Thread Martin Jambor
Hi,

because HSA backend is not based on RTL,we need our own, and it is in
this patch.  The allocator has been written by Michael Matz and I have
put it into a separate email so that I can add him to CC, because he
is much better suited to answer any questions or review comments.

Thanks,

Martin


2015-12-04  Michael Matz 
Martin Jambor  

* hsa-regalloc.c: New file.

diff --git a/gcc/hsa-regalloc.c b/gcc/hsa-regalloc.c
new file mode 100644
index 000..9db4c1d
--- /dev/null
+++ b/gcc/hsa-regalloc.c
@@ -0,0 +1,719 @@
+/* HSAIL IL Register allocation and out-of-SSA.
+   Copyright (C) 2013-15 Free Software Foundation, Inc.
+   Contributed by Michael Matz 
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "is-a.h"
+#include "vec.h"
+#include "tree.h"
+#include "dominance.h"
+#include "cfg.h"
+#include "cfganal.h"
+#include "function.h"
+#include "bitmap.h"
+#include "dumpfile.h"
+#include "cgraph.h"
+#include "print-tree.h"
+#include "cfghooks.h"
+#include "symbol-summary.h"
+#include "hsa.h"
+
+
+/* Process a PHI node PHI of basic block BB as a part of naive out-f-ssa.  */
+
+static void
+naive_process_phi (hsa_insn_phi *phi)
+{
+  unsigned count = phi->operand_count ();
+  for (unsigned i = 0; i < count; i++)
+{
+  gcc_checking_assert (phi->get_op (i));
+  hsa_op_base *op = phi->get_op (i);
+  hsa_bb *hbb;
+  edge e;
+
+  if (!op)
+   break;
+
+  e = EDGE_PRED (phi->m_bb, i);
+  if (single_succ_p (e->src))
+   hbb = hsa_bb_for_bb (e->src);
+  else
+   {
+ basic_block old_dest = e->dest;
+ hbb = hsa_init_new_bb (split_edge (e));
+
+ /* If switch insn used this edge, fix jump table.  */
+ hsa_bb *source = hsa_bb_for_bb (e->src);
+ hsa_insn_sbr *sbr;
+ if (source->m_last_insn
+ && (sbr = dyn_cast  (source->m_last_insn)))
+   sbr->replace_all_labels (old_dest, hbb->m_bb);
+   }
+
+  hsa_build_append_simple_mov (phi->m_dest, op, hbb);
+}
+}
+
+/* Naive out-of SSA.  */
+
+static void
+naive_outof_ssa (void)
+{
+  basic_block bb;
+
+  hsa_cfun->m_in_ssa = false;
+
+  FOR_ALL_BB_FN (bb, cfun)
+  {
+hsa_bb *hbb = hsa_bb_for_bb (bb);
+hsa_insn_phi *phi;
+
+for (phi = hbb->m_first_phi;
+phi;
+phi = phi->m_next ? as_a  (phi->m_next): NULL)
+  naive_process_phi (phi);
+
+/* Zap PHI nodes, they will be deallocated when everything else will.  */
+hbb->m_first_phi = NULL;
+hbb->m_last_phi = NULL;
+  }
+}
+
+/* Return register class number for the given HSA TYPE.  0 means the 'c' one
+   bit register class, 1 means 's' 32 bit class, 2 stands for 'd' 64 bit class
+   and 3 for 'q' 128 bit class.  */
+
+static int
+m_reg_class_for_type (BrigType16_t type)
+{
+  switch (type)
+{
+case BRIG_TYPE_B1:
+  return 0;
+
+case BRIG_TYPE_U8:
+case BRIG_TYPE_U16:
+case BRIG_TYPE_U32:
+case BRIG_TYPE_S8:
+case BRIG_TYPE_S16:
+case BRIG_TYPE_S32:
+case BRIG_TYPE_F16:
+case BRIG_TYPE_F32:
+case BRIG_TYPE_B8:
+case BRIG_TYPE_B16:
+case BRIG_TYPE_B32:
+case BRIG_TYPE_U8X4:
+case BRIG_TYPE_S8X4:
+case BRIG_TYPE_U16X2:
+case BRIG_TYPE_S16X2:
+case BRIG_TYPE_F16X2:
+  return 1;
+
+case BRIG_TYPE_U64:
+case BRIG_TYPE_S64:
+case BRIG_TYPE_F64:
+case BRIG_TYPE_B64:
+case BRIG_TYPE_U8X8:
+case BRIG_TYPE_S8X8:
+case BRIG_TYPE_U16X4:
+case BRIG_TYPE_S16X4:
+case BRIG_TYPE_F16X4:
+case BRIG_TYPE_U32X2:
+case BRIG_TYPE_S32X2:
+case BRIG_TYPE_F32X2:
+  return 2;
+
+case BRIG_TYPE_B128:
+case BRIG_TYPE_U8X16:
+case BRIG_TYPE_S8X16:
+case BRIG_TYPE_U16X8:
+case BRIG_TYPE_S16X8:
+case BRIG_TYPE_F16X8:
+case BRIG_TYPE_U32X4:
+case BRIG_TYPE_U64X2:
+case BRIG_TYPE_S32X4:
+case BRIG_TYPE_S64X2:
+case BRIG_TYPE_F32X4:
+case BRIG_TYPE_F64X2:
+  return 3;
+
+default:
+  gcc_unreachable ();
+}
+}
+
+/* If the Ith

Re: ipa-cp heuristics fixes

2015-12-10 Thread Martin Jambor
Hi,

thanks for looking into this, I only have one question:

On Thu, Dec 10, 2015 at 08:30:37AM +0100, Jan Hubicka wrote:
> Martin,
> while looking into the ipa-cp dumps for bzip and Firefox I noticed few issues.
> First of all, ipcp_cloning_candidate_p calls
>  optimize_function_for_speed_p (DECL_STRUCT_FUNCTION (node->decl))
> which can not be used at WPA time, becuase we have no DECL_STRUCT_FUNCTION
> around.  I replaced it by node->optimize_for_size_p ().
> 
> Second we perform incredible number of clones because we do obtain some sort 
> of
> polymorphic call context for them.  In wast majority of cases this is useless
> effort, because the functions in question do not contain virtual calls and do
> not pass the parameter further.  For firefox about 40k out of 50k clones
> created are created just because we found some context.
> 
> I changed the code to only clone if this immediately leads to 
> devirtualization.
> This do not cause any noticeable drop in number of devirtualized calls on
> Firefox. I suppose we will miss the case where cloning a caller may allow
> devirtualization in a clone of callee, but I do not think the heuristics for
> context independent values can handle this as implemented right now and it
> simply have way to many false positives.
> 
> What we can do is to devirtualize w/o cloning for local functions and
> speculatively devirtualize in case we would otherwise clone.
> 
> Third problem I noticed is that
> will_be_removed_from_program_if_no_direct_calls_p is used to decide if we can
> ignore the function size when deciding about the code size impact.
> This function is doing some analysis for inliner where it, for example, 
> analyses
> if a comdat which is going to be inlined consistently in the whole program
> will be removed.
> 
> In the cloning case I do not see this to apply: we have no evidence that the
> other units will pass the same constants to the function.  I think you
> basically want to assume that the  function will be removed if it has no
> address taken and it is not externally visibible. This is what local flag
> is for.
> 
> I gathered some stats:
> 
> number of clones for all contexts: 49948->11102
> number of clones: 4376->4383
> 
> good_cloning_opportunity_p is called about 70k times, I wonder if the
> thresholds are not simply set too high.  For example, inliner does about 300k
> inlines at Firefox.
> 
> number of param replacements: 13041-> 13056 + 5383 aggregate replacements (I 
> do not have data on unpatched tree for this)
> number of devirts: 956->933
> number of devirts happening at inline: 781->868
> number of indirect calls promoted: 512->512
> 
> Inliner stats from: Unit growth for small function inlining: 7965701->9130051 
> (14%)
> to: Unit growth for small function inlining: 7965010->9138577
> 
> So it seems that except for large drop in number of clones there is no 
> significant difference.
> 
> I am bootstrapping/regtesting this on x86_64-linux, does it seem OK?
> 
> Honza
> 
>   * ipa-cp.c (ipcp_cloning_candidate_p): Use node->optimize_for_size_p.
>   (good_cloning_opportunity_p): Likewise.
>   (gather_context_independent_values): Do not return true when
>   polymorphic call context is known or when we have known aggregate
>   value of unused parameter.
>   (estimate_local_effects): Try to create clone for all context
>   when either some params are substituted or devirtualization is possible
>   or some params can be removed; use local flag instead of
>   node->will_be_removed_from_program_if_no_direct_calls_p.
>   (identify_dead_nodes): Likewise.
> Index: ipa-cp.c
> ===
> --- ipa-cp.c  (revision 231477)
> +++ ipa-cp.c  (working copy)
> @@ -613,7 +613,7 @@ ipcp_cloning_candidate_p (struct cgraph_
>return false;
>  }
>  
> -  if (!optimize_function_for_speed_p (DECL_STRUCT_FUNCTION (node->decl)))
> +  if (node->optimize_for_size_p ())
>  {
>if (dump_file)
>  fprintf (dump_file, "Not considering %s for cloning; "
> @@ -2267,7 +2267,7 @@ good_cloning_opportunity_p (struct cgrap
>  {
>if (time_benefit == 0
>|| !opt_for_fn (node->decl, flag_ipa_cp_clone)
> -  || !optimize_function_for_speed_p (DECL_STRUCT_FUNCTION (node->decl)))
> +  || node->optimize_for_size_p ())
>  return false;
>  
>gcc_assert (size_cost > 0);
> @@ -2387,12 +2387,14 @@ gather_context_independent_values (struc
>   *removable_params_cost
> += ipa_get_param_move_cost (info, i);
>  
> +  if (!ipa_is_param_used (info, i))
> + continue;
> +

Is this really necessary, is it not enough to remove the assignment to
ret below?  If the parameter is not used, devirtualization time bonus,
which you then rely on estimate_local_effects, should be zero for it.

It is a very minor point, I suppose, but if the function gets cloned
for a different reason, it might still be beneficial to have as much
context-

Re: [hsa 1/10] Configury changes and new options

2015-12-10 Thread Martin Jambor
Hi,

On Tue, Dec 08, 2015 at 10:43:15PM +, Richard Sandiford wrote:
> [Sorry for the low-quality review, was just reading out of interest...]
> 
> Martin Jambor  writes:
> > +If you configure GCC with HSA offloading but do not have the HSA
> > +run-time library installed in a standard location then you can
> > +explicitely specify the directory where they are installed.  The
> 
> typo: explicitly

oops.  For some reason, my spell-checker accepts this typo.  I will
fix it.

> 
> > diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
> > index e4772d1..5609207 100644
> > --- a/gcc/lto-wrapper.c
> > +++ b/gcc/lto-wrapper.c
> > @@ -745,6 +745,11 @@ compile_images_for_offload_targets (unsigned in_argc, 
> > char *in_argv[],
> >offload_names = XCNEWVEC (char *, num_targets + 1);
> >for (unsigned i = 0; i < num_targets; i++)
> >  {
> > +  /* HSA does not use LTO-like streaming and a different compiler, skip
> > +it. */
> > +  if (strncmp(names[i], "hsa", 3) == 0)
> > +   continue;
> > +
> >offload_names[i]
> > = compile_offload_image (names[i], compiler_path, in_argc, in_argv,
> >  compiler_opts, compiler_opt_count,
> 
> Looks like this would cause the caller loop:
> 
>   if (offload_names)
>   {
> find_offloadbeginend ();
> for (i = 0; offload_names[i]; i++)
>   printf ("%s\n", offload_names[i]);
> free_array_of_ptrs ((void **) offload_names, i);
>   }
> 
> to terminate early if there was another target after hsa.
> 

Good catch.  I have modified this code so that it never leaves any
holes in offload_names[i].

> names[i] is null-terminated, so it looks like you're deliberately
> allowing anything that starts with "hsa" here, but:

Right, and that was probably a mistake, I have changed the check to
simple strcmp.

> 
> > diff --git a/gcc/opts.c b/gcc/opts.c
> > index 874c84f..5647f0c 100644
> > --- a/gcc/opts.c
> > +++ b/gcc/opts.c
> > @@ -1906,8 +1906,35 @@ common_handle_option (struct gcc_options *opts,
> >break;
> >  
> >  case OPT_foffload_:
> > -  /* Deferred.  */
> > -  break;
> > +  {
> > +   const char *p = arg;
> > +   opts->x_flag_disable_hsa = true;
> > +   while (*p != 0)
> > + {
> > +   const char *comma = strchr (p, ',');
> > +
> > +   if ((strncmp (p, "disable", 7) == 0)
> > +   && (p[7] == ',' || p[7] == '\0'))
> > + {
> > +   opts->x_flag_disable_hsa = true;
> > +   break;
> > + }
> > +
> > +   if ((strncmp (p, "hsa", 3) == 0)
> > +   && (p[3] == ',' || p[3] == '\0'))
> > + {
> > +#ifdef ENABLE_HSA
> > +   opts->x_flag_disable_hsa = false;
> > +#else
> > +   sorry ("HSA has not been enabled during configuration");
> > +#endif
> 
> ...here you only allow "hsa" itself.
> 
> (Not your fault, but: do we have any documentation for -foffload
> and -foffload-abi?  Couldn't see any in the texi files.)

Yes, that is actually PR 67300.  However, I do not understand the more
complex forms the parameter can take enough to attempt to fix it.

In order to address all for you concerns, I am going to install the
following on the branch.

Thanks for the feedback,

Martin


2015-12-09  Martin Jambor  

* lto-wrapper.c (compile_images_for_offload_targets): Do not leave
holes in offload_names.  Use strcmp instead strncmp.
* doc/install.texi (--with-hsa-runtime): Fix typo.
---
 gcc/doc/install.texi | 2 +-
 gcc/lto-wrapper.c| 8 +---
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index afd891c..a85a063 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1993,7 +1993,7 @@ compiler will emit the accelerator code, no path should 
be specified.
 
 If you configure GCC with HSA offloading but do not have the HSA
 run-time library installed in a standard location then you can
-explicitely specify the directory where they are installed.  The
+explicitly specify the directory where they are installed.  The
 @option{--with-hsa-runtime=@/@var{hsainstalldir}} option is a
 shorthand for
 @option{--with-hsa-runtime-lib=@/@var{hsainstalldir}/lib} and
diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index 5609207..5b58fd6 100644
--- a/gcc/lto-wrapper.c
+++ b/gcc/lto-wrapper.c
@@ -736,6 +736,7 @@ compile_images_for_offload_targets (unsigned in_argc

Re: [hsa 1/10] Configury changes and new options

2015-12-10 Thread Martin Jambor
Hi,

On Mon, Dec 07, 2015 at 12:19:08PM +0100, Martin Jambor wrote:
> Hi,
> 
> this patch contains changes to the configuration mechanism and offload
> bits, so that users can build compilers with HSA support.

when writing up how to build an HSA-enabled GCC for the wiki page, and
checking the process actually works I realized that AMD no longer
ships libhsakmt as a part of the run time.  So we either have to tell
users to copy the library over to the same directory where the
run-time is (what I did on my machines and then forgot about it) or
provide one more configuration option, otherwise configure libgomp
fails.  The patch below does the second.  I have verified that the
configuration works as intended fo freshly downloaded/built HSA
run-time and libhsakmt.  I am going to commit it to the branch shortly
and of course would need it as part of hsa merge,

Sorry for realizing this so late,

Martin

2015-12-09  Martin Jambor  

libgomp/
* plugin/configfrag.ac (hsa-kmt-lib): New.

gcc/
* doc/install.texi (Configuration): Document --with-hsa-kmt-lib.
---
 gcc/doc/install.texi |  5 +
 libgomp/configure| 17 +++--
 libgomp/plugin/configfrag.ac |  8 
 3 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 232586d..afd891c 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1999,6 +1999,11 @@ shorthand for
 @option{--with-hsa-runtime-lib=@/@var{hsainstalldir}/lib} and
 @option{--with-hsa-runtime-include=@/@var{hsainstalldir}/include}.
 
+@item --with-hsa-kmt-lib=@var{pathname}
+
+If you configure GCC with HSA offloading but do not have the HSA
+KMT library installed in a standard location then you can
+explicitly specify the directory where it resides.
 @end table
 
 @subheading Cross-Compiler-Specific Options
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
index c50e5cb..fd77429 100644
--- a/libgomp/plugin/configfrag.ac
+++ b/libgomp/plugin/configfrag.ac
@@ -118,6 +118,14 @@ if test "x$HSA_RUNTIME_LIB" != x; then
   HSA_RUNTIME_LDFLAGS=-L$HSA_RUNTIME_LIB
 fi
 
+AC_ARG_WITH(hsa-kmt-lib,
+   [AS_HELP_STRING([--with-hsa-kmt-lib=PATH],
+   [specify directory for installed HSA KMT library.])])
+if test "x$with_hsa_kmt_lib" != x; then
+  HSA_RUNTIME_LDFLAGS="$HSA_RUNTIME_LDFLAGS -L$with_hsa_kmt_lib"
+  HSA_RUNTIME_LIB=
+fi
+
 PLUGIN_HSA=0
 PLUGIN_HSA_CPPFLAGS=
 PLUGIN_HSA_LDFLAGS=
-- 
2.6.3



Re: [hsa 2/10] Modifications to libgomp proper

2015-12-10 Thread Martin Jambor
Hi,

thanks for the feedback.  I have incorporated most of it into the
branch (the diff is below) but  also have a few questions.

On Wed, Dec 09, 2015 at 12:35:36PM +0100, Jakub Jelinek wrote:
> On Mon, Dec 07, 2015 at 12:19:57PM +0100, Martin Jambor wrote:
> > +/* Flag set when the subsequent element in the device-specific argument
> > +   values.  */
> > +#define GOMP_TARGET_ARG_SUBSEQUENT_PARAM   (1 << 7)
> > +
> > +/* Bitmask to apply to a target argument to find out the value identifier. 
> >  */
> > +#define GOMP_TARGET_ARG_ID_MASK(((1 << 8) - 1) << 8)
> > +/* Target argument index of NUM_TEAMS.  */
> > +#define GOMP_TARGET_ARG_NUM_TEAMS  (1 << 8)
> > +/* Target argument index of THREAD_LIMIT.  */
> > +#define GOMP_TARGET_ARG_THREAD_LIMIT   (2 << 8)
> 
> I meant that these two would be just special, passed as the first two
> pointers in the array, without the markup.  Because, otherwise you either
> need to use GOMP_TARGET_ARG_SUBSEQUENT_PARAM for these always, or for 32-bit
> arches and for 64-bit ones shift often at runtime.  Having the markup even
> for them is perhaps cleaner, but less efficient, so if you really want to go
> that way, please make sure you handle it properly for 32-bit pointers
> architectures though.  num_teams or thread_limit could be > 32767 or >
> 65535.

I see, I prefer the clean approach, even if it is more work, this
interface looks like it is going to be extended in the future.  But I
am wondering whether embedding the value into the identifier element
is actually worth it.  The passed array is going to be a small local
variable and I wonder whether there is going to be any benefit in it
having two elements instead of four (or four instead of six for
gridified kernels), especially if it means introducing control flow on
the part of the caller.  But if you really want it that way, I will
implement that.

> 
> > -static void
> > -gomp_target_fallback_firstprivate (void (*fn) (void *), size_t mapnum,
> > -  void **hostaddrs, size_t *sizes,
> > -  unsigned short *kinds)
> > +static void *
> > +gomp_target_unshare_firstprivate (size_t mapnum, void **hostaddrs,
> > + size_t *sizes, unsigned short *kinds)
> >  {
> >size_t i, tgt_align = 0, tgt_size = 0;
> >char *tgt = NULL;
> > @@ -1281,7 +1282,7 @@ gomp_target_fallback_firstprivate (void (*fn) (void 
> > *), size_t mapnum,
> >}
> >if (tgt_align)
> >  {
> > -  tgt = gomp_alloca (tgt_size + tgt_align - 1);
> > +  tgt = gomp_malloc (tgt_size + tgt_align - 1);
> 
> I don't like using gomp_malloc here, either copy/paste the function, or
> create separate inline functions for the two loops, one for the first loop
> which returns you tgt_align and tgt_size, and another for the stuff after
> the allocation.  Then you can use those two inline functions to implement
> both gomp_target_fallback_firstprivate which will use alloca, and
> gomp_target_unshare_firstprivate which will use gomp_malloc instead.

OK, I did that.

> 
> > @@ -1356,6 +1377,11 @@ GOMP_target (int device, void (*fn) (void *), const 
> > void *unused,
> > and several arguments have been added:
> > FLAGS is a bitmask, see GOMP_TARGET_FLAG_* in gomp-constants.h.
> > DEPEND is array of dependencies, see GOMP_task for details.
> > +   ARGS is a pointer to an array consisting of NUM_TEAMS, THREAD_LIMIT and 
> > a
> > +   variable number of device-specific arguments, which always take two 
> > elements
> > +   where the first specifies the type and the second the actual value.  The
> > +   last element of the array is a single NULL.
> 
> Note, here you document NUM_TEAMS and THREAD_LIMIT as special values, not
> encoded.

I have changed the comment but will remember to do it again if
necessary after changing omp-low.c

> 
> > @@ -1473,6 +1508,7 @@ GOMP_target_data (int device, const void *unused, 
> > size_t mapnum,
> >struct gomp_device_descr *devicep = resolve_device (device);
> >  
> >if (devicep == NULL
> > +  || (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
> >|| !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
> 
> Would be nice to have some consistency in the order of capabilities checks.
> Usually you check SHARED_MEM after OPENMP_400, so perhaps do it this way
> here too.

Sure.

> 
> > @@ -1741,23 +1784,38 @@ gomp_target_task_fn (void *data)
> >  
> >if (ttask->state == GOMP_TARGET_TASK_FINISHED)
> > {
> > -   

<    1   2   3   4   5   6   7   8   9   10   >