[PR 69920] Prevent SRA from leaving a removed SSA_NAME in IL
Hi, my fix for PR 69666 has caused quite a few regressions accross the borad where SRA removed a SSA_NAME which however still was in the IL (and usually stumbled upon it itself straight away). The removal path should not be executed when there is an SSA_NAME on the LHS, the code clearly is not ready for it. Before my patch, we got always lucky because the statement was simply modified elsewhere when the LHS was an SSA_NAME. However, even that was not 10% guaranteed because of the !access_has_replacements_p (racc) part of the changed condition. The patch below fixes the ICEs simply by guarding the removal code to only work when the LHS is not an SSA_NAME. This means that the safe path below it is going to execute. I have bootstrapped and tested the patch on x86_64-linux. I'd like to commit it to trunk as soon as it gets approved and then I'd like to commit it to gcc-5 branch together with the PR 69666 fix a few days afterwards. OK? Thanks, Martin 2016-02-26 Martin Jambor PR middle-end/69920 * tree-sra.c (sra_modify_assign): Do not remove loads of uninitialized aggregates to SSA_NAMEs. testsuite/ * gcc.dg/torture/pr69932.c: New test. * gcc.dg/torture/pr69936.c: Likewise. --- gcc/testsuite/gcc.dg/torture/pr69932.c | 10 ++ gcc/testsuite/gcc.dg/torture/pr69936.c | 24 gcc/tree-sra.c | 3 ++- 3 files changed, 36 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/torture/pr69932.c create mode 100644 gcc/testsuite/gcc.dg/torture/pr69936.c diff --git a/gcc/testsuite/gcc.dg/torture/pr69932.c b/gcc/testsuite/gcc.dg/torture/pr69932.c new file mode 100644 index 000..4b82130 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr69932.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ + +int a; +void fn1() { + int b = 4; + short c[4]; + c[b] = c[a]; + if (c[2]) {} + +} diff --git a/gcc/testsuite/gcc.dg/torture/pr69936.c b/gcc/testsuite/gcc.dg/torture/pr69936.c new file mode 100644 index 000..3023bbb --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr69936.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ + +int a; +char b; +void fn1(int p1) {} + +int fn2() { return 5; } + +void fn3() { + if (fn2()) +; + else { +char c[5]; +c[0] = 5; + lbl_608: +fn1(c[9]); +int d = c[9]; +c[2] | a; +d = c[b]; + } + goto lbl_608; +} + +int main() { return 0; } diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 663ded2..366f413 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -3504,7 +3504,8 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi) else { if (access_has_children_p (racc) - && !racc->grp_unscalarized_data) + && !racc->grp_unscalarized_data + && TREE_CODE (lhs) != SSA_NAME) { if (dump_file) { -- 2.7.1
Re: (Non-)offloading diagnostics
Hi, On Fri, Feb 26, 2016 at 05:46:33PM +0100, Thomas Schwinge wrote: > Hi! > > In light of the -Whsa testsuite patches just posted, I think we first > need to clarify the general policy questions I posted a month ago: > > On Tue, 26 Jan 2016 11:46:14 +0100, I wrote: > > On Thu, 10 Dec 2015 18:51:48 +0100, Martin Jambor wrote: > > > On Mon, Dec 07, 2015 at 12:46:45PM +0100, Jakub Jelinek wrote: > > > > On Mon, Dec 07, 2015 at 12:17:58PM +0100, Martin Jambor wrote: > > > > > [...] There are no failing > > > > > testcases if HSA is not configured. If it is, there are some, all of > > > > > which fall into one the following categories: > > > > > > > > > > 1) HSA cannot compile a function for one reason or another (most > > > > > common cause is inability of HSA to take an address of a function > > > > > or make an indirect call) and gives a warning, which is regarded > > > > > as an "excess error" by dejagnu. > > > > Confirmed: > > > > [...]/gcc/testsuite/c-c++-common/gomp/clauses-1.c: In function > > 'bar._omp_fn.26.hsa.31': > > cc1: warning: could not emit HSAIL for the function [-Whsa] > > cc1: note: support for HSA does not implement non-gridified OpenMP > > parallel constructs. > > [...] > > > > ..., and many more. So, with --enable-offload-targets=[...],hsa we > > regress (PASS -> FAIL; "test for excess errors") such compile tests. > > > > > > It would be good if there is a -W* switch to turn such warnings off. > > > > Not just for the purposes of dejagnu libgomp testing, but say one > > > > might try to compile a program primarily say for XeonPhi or PTX > > > > offloading, > > > > but have HSA enabled to, but care primarily about the former two, etc. > > > > > > All these warnings are in the -Whsa group and can be suppressed with > > > -Wno-hsa. > > > > These compile tests are done without any -W* flags; -Whsa is enabled by > > default. > > I'm a proponent of enabling as many useful warnings by default, or if not > by default, then with -Wall. -Whsa is enabled by default, and has thus > set a precedent of doing that. I am not sure I'd go as far as "as many as possible," but in the case of -Whsa, the warnings get emitted only if HSA offloading is configured and especially only if the user used OMP and its target construct. This means that it is relevant only for a rather small class of users and it's not a "your code looks weird" kind of warning but a "the compiler is not doing what you clearly asked for" warning. So that is why we decided to warn unconditionally. But as far as I understand, gcc does not give any promises about warnings, so I believe decisions like a defaultness of a warning can be revisited at any point in the future, for example if people learn not to expect some constructs to be offloaded to GPUs. Moreover, the conventions regarding offloading are still being settled and still will for quite some time so nobody should really expect such details to be set in stone. > > > How to address this mismatch? Put -Wno-has into all regressing > > test case files individually? Run the affected testsuites with -Wno-hsa? > > Not enable -Whsa by default (but I agree it's useful to users)? > > (Instead, enable with -Wall, which any sane user should be specifying?) > > Even if a bit tedious, my preference actually is to add to the test cases > an (expected) dg-warning everywhere where such a non-offloading warning > currently triggers, because that's what users will be seeing (with -Whsa > enabled by default), and because that will make it obvious (PASS -> FAIL > for the warning check) when that warning disappears (say, because the > compiler can now offload the respective construct, yay). That is my opinion as well, except that given the number of warnings now (with dynamic parallelism disabled), I prefer to work on the file granularity. Also, often testcases use macros heavily and putting dg-warning into them is somewhere between weird and outright impossible. On the other hand, as you have probably noticed, Jakub asked me to pass -Wno-hsa to all tests instead so he seems to have the opposing point of view. I must say that I am not really ready to argue about this too much, especially if we have our own HSA testsuite directory. Martin > > > A very similar problem also exists for nvptx offloading (Nathan CCed), > > where we emit similar warnings (enabled by default). As nvptx offloading > > happens during link-time (not compile-time, as with hsa offloading), > > these don't affect GCC's compile tests, but need to be worked around in > > libgomp test cases. > > > Grüße > Thomas
Re: (Non-)offloading diagnostics
Hi, On Fri, Feb 26, 2016 at 06:51:34PM +0100, Jakub Jelinek wrote: > On Fri, Feb 26, 2016 at 06:18:13PM +0100, Martin Jambor wrote: > > > I'm a proponent of enabling as many useful warnings by default, or if not > > > by default, then with -Wall. -Whsa is enabled by default, and has thus > > > set a precedent of doing that. > > > > I am not sure I'd go as far as "as many as possible," but in the case > > of -Whsa, the warnings get emitted only if HSA offloading is > > configured and especially only if the user used OMP and its target > > construct. This means that it is relevant only for a rather small > > class of users and it's not a "your code looks weird" kind of warning > > but a "the compiler is not doing what you clearly asked for" warning. > > So that is why we decided to warn unconditionally. > > > > But as far as I understand, gcc does not give any promises about > > warnings, so I believe decisions like a defaultness of a warning can > > be revisited at any point in the future, for example if people learn > > not to expect some constructs to be offloaded to GPUs. Moreover, the > > conventions regarding offloading are still being settled and still > > will for quite some time so nobody should really expect such details > > to be set in stone. > > The thing is, most of the tests in the libgomp.{c,c++,fortran}/ testsuite > are (meant to be) valid OpenMP testcases, having them full of dozens of > dg-warning lines where every of the 10+ different offloading target warns > about something would be a maintainance nightmare. Agreed, having such dg-warnings would definitely be an overkill. I only intended to mark the whole test with an option. I am willing to be looking for new hsa warnings and examine them myself, adding the option if necessary. I would not expect the originator of the testcase or anybody who does not care for HSA to do it. > E.g. when adding > new OpenMP tests, one would need to configure all the offloaders > (individually?), for some you need hw not every committer has, No special hardware is necessary to see the warning (though you need at least https://github.com/HSAFoundation/HSA-Runtime-Reference-Source to build the libgomp plugin). Once gcc decides to emit HSAIL, it of course has to work as expected. There should be no need to configure hsa individually either. > for others > there are other issues (e.g., is the required amdkfd going to be submitted > for upstream kernel? I might have hard time convincing our kernel > maintainers to use that instead of what is in upstream kernel others). I am being repeatedly told it will be and very soon, but apparently it takes longer than AMD anticipated. I don't think anybody expects any distribution to pick it up on their own (...but you know, Red Hat is actually the company that now employs the upstream kernel kfd maintainer ;-). > So, IMHO if you want to check for warnings, do that as Martin has added a > new subdir with only hsa OpenMP tests, if you want test warnings on tests we > already have elsewhere, #include them in the other dir, dg-do link instead > of run (so that it is not run multiple times), and check for the warnings; > you could also use -foffload=hsa in there to make sure you only have to care > about hsa warnings, and not NVPTX, or whatever other offloader. > Just to be clear, I never wanted to be testing for presence of warnings, I see no value in that. All in all, I am willing to add -Wno-hsa to default options and only have these warnings on in dedicated HSA directories. I will amend the posted patches once testsuite maintainers look at my initial proposal for the first such directory. Martin
Re: [hsa merge 08/10] HSAIL BRIG description header file
Hi, I hope I've got some good news: On Thu, Jan 14, 2016 at 05:18:56PM -0800, Ian Lance Taylor wrote: > Jakub Jelinek writes: > > > On Wed, Jan 13, 2016 at 06:39:33PM +0100, Martin Jambor wrote: > >> the following patch adds a BRIG (binary representation of HSAIL) > >> representation description. It is within a single header file > >> describing the binary structures and constants of the format. > >> > >> The file comes from the HSA Foundation (I have only added the > >> HSA_BRIG_FORMAT_H macro and check and removed some weird comments > >> which are not present in proposed future versions of the file) and is > >> licensed under "University of Illinois/NCSA Open Source License." > >> > >> The license is "GPL-compatible" according to FSF > >> (http://www.gnu.org/licenses/license-list.en.html#GPLCompatibleLicenses) > >> so I believe we can have it in GCC. Nevertheless, it is not GPL and > >> there is no copyright assignment for it, but the situation is > >> hopefully analogous to some other libraries that have their upstream > >> elsewhere but we ship them as part of the GCC. > >> > >> In the previous posting of this patch > >> (https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00721.html) I have > >> requested a permission from the steering committee to include this file > >> with a different upstream in GCC. I have not received an official > >> reply but since I have been chosen to be the HSA maintainer, I tend to > >> think there were no legal objections against HSA going forward, > >> including this file. > > Martin, could you ask the HSA Foundation or AMD or whoever if there is > any way they could remove the second requirement of the license? It > adds yet another case where anybody distributing GCC has to list yet > another copyright notice. > I have asked HSA foundation to do just that and apparently they agreed to change the licensing of the file (in upcoming versions of HSA) to the MIT license. IIUC, the reading of the license header would then be the one below. I hope that means the problematic requirements will be gone and we will be able to just use their file. If, however, you still think there will be issues preventing us from doing that, please let me know as soon as possible. Thanks, Martin The license is going to be: The MIT License (MIT) Copyright (c) 2016, HSA Foundation, Inc * All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE."
Re: [hsa, testsuite] Suppress hsa warnings in libgomp tests
Hi, On Fri, Feb 26, 2016 at 05:07:56PM +0100, Jakub Jelinek wrote: > On Fri, Feb 26, 2016 at 04:59:57PM +0100, Martin Jambor wrote: > > just like with the compiler gomp testsuite, we need to add -Wno-hsa to > > options when compiling libgomp testcases in order not to have "excess > > errors" failures when HSA is enabled. ... > > I don't like this very much. > Couldn't you instead add -Wno-hsa next to -fopenmp in *.exp, and just where > you want to explicitly check the hsa warnings, enable it manually in > dg-options or dg-additional-options (it would need to be guarded with hsa > being enabled etc. anyway). > as Jakub requested, this patch deals with HSA "excess errors" in the libgomp library testsuite by passing -Wno-hsa to all of them. IIUC, that passing it in the second parameter of dg-runtest (as opposed to the third) means that it will apply even tests that have their own dg-options, which is presumably easier for everyone, provided that hsa will get is own libgomp testsuite directories. OK for trunk? Thanks, Martin 2016-02-29 Martin Jambor * testsuite/libgomp.c/c.exp: Pass -Wno-hsa to all tests. * testsuite/libgomp.c++/c++.exp: Likewise. * testsuite/libgomp.fortran/fortran.exp: Likewise. --- libgomp/testsuite/libgomp.c++/c++.exp | 2 +- libgomp/testsuite/libgomp.c/c.exp | 2 +- libgomp/testsuite/libgomp.fortran/fortran.exp | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/libgomp/testsuite/libgomp.c++/c++.exp b/libgomp/testsuite/libgomp.c++/c++.exp index 0454f95..120e573 100644 --- a/libgomp/testsuite/libgomp.c++/c++.exp +++ b/libgomp/testsuite/libgomp.c++/c++.exp @@ -65,7 +65,7 @@ if { $lang_test_file_found } { } # Main loop. -dg-runtest $tests "" "$libstdcxx_includes $DEFAULT_CFLAGS" +dg-runtest $tests "-Wno-hsa" "$libstdcxx_includes $DEFAULT_CFLAGS" } # All done. diff --git a/libgomp/testsuite/libgomp.c/c.exp b/libgomp/testsuite/libgomp.c/c.exp index 300b921..d3cd144 100644 --- a/libgomp/testsuite/libgomp.c/c.exp +++ b/libgomp/testsuite/libgomp.c/c.exp @@ -31,7 +31,7 @@ append ld_library_path [gcc-set-multilib-library-path $GCC_UNDER_TEST] set_ld_library_path_env_vars # Main loop. -dg-runtest $tests "" $DEFAULT_CFLAGS +dg-runtest $tests "-Wno-hsa" $DEFAULT_CFLAGS # All done. dg-finish diff --git a/libgomp/testsuite/libgomp.fortran/fortran.exp b/libgomp/testsuite/libgomp.fortran/fortran.exp index 9e6b643..ea84d5c 100644 --- a/libgomp/testsuite/libgomp.fortran/fortran.exp +++ b/libgomp/testsuite/libgomp.fortran/fortran.exp @@ -66,7 +66,7 @@ if { $lang_test_file_found } { # For Fortran we're doing torture testing, as Fortran has far more tests # with arrays etc. that testing just -O0 or -O2 is insufficient, that is # typically not the case for C/C++. -gfortran-dg-runtest $tests "" "" +gfortran-dg-runtest $tests "-Wno-hsa" "" } # All done. -- 2.7.1
Re: [hsa, testsuite] Suppress hsa warnings in compiler gomp tests
Hi, as Jakub requested in another thread, this patch deals with HSA "excess errors" in the gomp compiler testsuite by passing -Wno-hsa to all of them. IIUC, that passing it in the second parameter of *-dg-runtest (as opposed to the third) means that it will apply even tests that have their own dg-options, which is presumably easier for everyone, provided that hsa will get is own libgomp testsuite directories. OK for trunk? Thanks, Martin 2016-02-29 Martin Jambor * g++.dg/gomp/gomp.exp: Pass -Wno-hsa to all tests. * gcc.dg/gomp/gomp.exp: Likewise. * gfortran.dg/gomp/gomp.exp: Likewise. --- gcc/testsuite/g++.dg/gomp/gomp.exp | 2 +- gcc/testsuite/gcc.dg/gomp/gomp.exp | 2 +- gcc/testsuite/gfortran.dg/gomp/gomp.exp | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/testsuite/g++.dg/gomp/gomp.exp b/gcc/testsuite/g++.dg/gomp/gomp.exp index 7365389..bee5441 100644 --- a/gcc/testsuite/g++.dg/gomp/gomp.exp +++ b/gcc/testsuite/g++.dg/gomp/gomp.exp @@ -29,7 +29,7 @@ dg-init # Main loop. g++-dg-runtest [lsort [concat \ [find $srcdir/$subdir *.C] \ - [find $srcdir/c-c++-common/gomp *.c]]] "" "-fopenmp" + [find $srcdir/c-c++-common/gomp *.c]]] "-Wno-hsa" "-fopenmp" # All done. dg-finish diff --git a/gcc/testsuite/gcc.dg/gomp/gomp.exp b/gcc/testsuite/gcc.dg/gomp/gomp.exp index 78623fc..d0889c5 100644 --- a/gcc/testsuite/gcc.dg/gomp/gomp.exp +++ b/gcc/testsuite/gcc.dg/gomp/gomp.exp @@ -31,7 +31,7 @@ dg-init # Main loop. dg-runtest [lsort [concat \ [find $srcdir/$subdir *.c] \ - [find $srcdir/c-c++-common/gomp *.c]]] "" "-fopenmp" + [find $srcdir/c-c++-common/gomp *.c]]] "-Wno-hsa" "-fopenmp" # All done. dg-finish diff --git a/gcc/testsuite/gfortran.dg/gomp/gomp.exp b/gcc/testsuite/gfortran.dg/gomp/gomp.exp index 625361b..78d70b5 100644 --- a/gcc/testsuite/gfortran.dg/gomp/gomp.exp +++ b/gcc/testsuite/gfortran.dg/gomp/gomp.exp @@ -30,7 +30,7 @@ dg-init # Main loop. gfortran-dg-runtest [lsort \ - [find $srcdir/$subdir *.\[fF\]{,90,95,03,08} ] ] "" "-fopenmp" + [find $srcdir/$subdir *.\[fF\]{,90,95,03,08} ] ] "-Wno-hsa" "-fopenmp" # All done. dg-finish -- 2.7.1
Re: [hsa, testsuite] Suppress hsa warnings in libgomp tests
Hi On Tue, Mar 01, 2016 at 07:47:49PM +0100, Jakub Jelinek wrote: > On Tue, Mar 01, 2016 at 07:39:18PM +0100, Martin Jambor wrote: > > as Jakub requested, this patch deals with HSA "excess errors" in the > > libgomp library testsuite by passing -Wno-hsa to all of them. IIUC, > > that passing it in the second parameter of dg-runtest (as opposed to > > the third) means that it will apply even tests that have their own > > dg-options, which is presumably easier for everyone, provided that hsa > > will get is own libgomp testsuite directories. > > What is the difference betwee the $flags and $default-extra-cflags > arguments to dg-runtest? well, exactly what I wrote in the original email and what you have quoted (and me as well) above. But let me quote the dejagnu source comment of dg-runtest, which is perhaps more clear: # FLAGS is a set of options to always pass. # DEFAULT_EXTRA_FLAGS is a set of options to pass if the testcase # doesn't # specify any (with dg-option). So if I changed DEFAULT_EXTRA_FLAGS rather than FLAGS, I'd have to go through all testcases specifying dg-options and add -Wno-hsa there too. Moreover, we'd have to add -Wno-hsa to all appropriate future testcases if they specify their own dg-options. Perhaps we should be using dg-additional-options in libgomp testsuite wherever possible but there certainly are testcases using dg-options. > You seem to stick -Wno-hsa into the former, > which to me looks like it will be mentioned as part of the test > names (e.g. when cycling through -O* options, -Wno-hsa would be printed > along with -O2 etc.)? Yes, that is an unfortunate side-effect. Furthermore, automated comparison scripts might be confused by the change (mine was, reporting all testcases as newly passed/xfailed and old as disappeared). But again, I do not have a strong preference, I can change the patches to use DEFAULT_EXTRA_FLAGS and am willing to be watching for fallout and fixing dg-options if you prefer that. So let me know what you consider nicer and I'll do it. Thanks, Martin
Re: [hsa, testsuite] Suppress hsa warnings in libgomp tests
Hi, On Tue, Mar 01, 2016 at 11:06:43PM +0100, Jakub Jelinek wrote: > On Tue, Mar 01, 2016 at 10:47:46PM +0100, Martin Jambor wrote: > > well, exactly what I wrote in the original email and what you have > > quoted (and me as well) above. But let me quote the dejagnu source > > comment of dg-runtest, which is perhaps more clear: > > > > # FLAGS is a set of options to always pass. > > # DEFAULT_EXTRA_FLAGS is a set of options to pass if the testcase > > # doesn't > > # specify any (with dg-option). > > > > So if I changed DEFAULT_EXTRA_FLAGS rather than FLAGS, I'd have to go > > through all testcases specifying dg-options and add -Wno-hsa there > > too. Moreover, we'd have to add -Wno-hsa to all appropriate future > > testcases if they specify their own dg-options. > > Ah, ok; what about adding > # Disable HSA warnings by default. > lappend ALWAYS_CFLAGS "additional_flags=-Wno-hsa" > in libgomp/testsuite/lib/libgomp.exp (next to e.g. > -fno-diagnostics-show-caret)? > That works nicely (though I have to override it explicitely in the libgomp.hsa.c directory with another -Whsa, but I guess we can live with that). So I will use the above for the libgomp case. I have tried to come up with a similar alternative for gcc.dg/gomp/gomp.exp, g++.dg/gomp/gomp.exp and gfortran/gomp/gomp.exp but so far I have not achieved to make the C++ and Fortran cases work in any other way but pass -Wno-hsa in FLAGS (and thus change the name). For C, adding the following before the main loop works, even though it looks too much like a hack to me: global TEST_ALWAYS_FLAGS set TEST_ALWAYS_FLAGS [concat $TEST_ALWAYS_FLAGS "-Wno-hsa"] However, the C++ and Fortran cases use gfortran-dg-runtest to cycle through a set of torture options and I have not yet discovered the right magic variable to set (for example, adding -Wno-hsa to DG_TORTURE_OPTIONS elements does not work). I'm afraid I have spent way too much time on this already, so unless someone has any ideas, I'd suggest that we use the (already approved) name-changing gomp patch as it is. Or at least for C++ and Fortran. Thanks, Martin
Re: [hsa, testsuite] Suppress hsa warnings in libgomp tests
Hi, On Fri, Mar 04, 2016 at 04:31:29PM +0100, Jakub Jelinek wrote: > On Fri, Mar 04, 2016 at 04:27:11PM +0100, Martin Jambor wrote: > > > Ah, ok; what about adding > > > # Disable HSA warnings by default. > > > lappend ALWAYS_CFLAGS "additional_flags=-Wno-hsa" > > > in libgomp/testsuite/lib/libgomp.exp (next to e.g. > > > -fno-diagnostics-show-caret)? > > > > > > > That works nicely (though I have to override it explicitely in the > > libgomp.hsa.c directory with another -Whsa, but I guess we can live > > with that). So I will use the above for the libgomp case. > > Ok. > > > I have tried to come up with a similar alternative for > > gcc.dg/gomp/gomp.exp, g++.dg/gomp/gomp.exp and gfortran/gomp/gomp.exp > > but so far I have not achieved to make the C++ and Fortran cases work > > in any other way but pass -Wno-hsa in FLAGS (and thus change the > > name). For C, adding the following before the main loop works, even > > though it looks too much like a hack to me: > > > > global TEST_ALWAYS_FLAGS > > set TEST_ALWAYS_FLAGS [concat $TEST_ALWAYS_FLAGS "-Wno-hsa"] > > Doesn't this also cause the -Wno-hsa option on all further tests executed by > other *.exp after gomp.exp by the same runtest invocation? > Not in the limited runs that I experimented with so far, but I certainly kept this possibility in mind too. If so, I would either set it back before invoking dg-finish or dismiss the whole idea. > > However, the C++ and Fortran cases use gfortran-dg-runtest to cycle > > through a set of torture options and I have not yet discovered the > > right magic variable to set (for example, adding -Wno-hsa to > > DG_TORTURE_OPTIONS elements does not work). > > > > I'm afraid I have spent way too much time on this already, so unless > > someone has any ideas, I'd suggest that we use the (already approved) > > name-changing gomp patch as it is. Or at least for C++ and Fortran. > > Do you have URL for what you refer to? > Sure, the patch has been posted here: https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00071.html and approved here: https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00074.html Martin
Re: [hsa, testsuite] Suppress hsa warnings in libgomp tests
On Fri, Mar 04, 2016 at 05:04:31PM +0100, Jakub Jelinek wrote: > On Fri, Mar 04, 2016 at 05:01:34PM +0100, Martin Jambor wrote: > > Not in the limited runs that I experimented with so far, but I > > certainly kept this possibility in mind too. If so, I would either > > set it back before invoking dg-finish or dismiss the whole idea. > > > > > > However, the C++ and Fortran cases use gfortran-dg-runtest to cycle > > > > through a set of torture options and I have not yet discovered the > > > > right magic variable to set (for example, adding -Wno-hsa to > > > > DG_TORTURE_OPTIONS elements does not work). > > > > > > > > I'm afraid I have spent way too much time on this already, so unless > > > > someone has any ideas, I'd suggest that we use the (already approved) > > > > name-changing gomp patch as it is. Or at least for C++ and Fortran. > > > > > > Do you have URL for what you refer to? > > > > > > > Sure, the patch has been posted here: > > > > https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00071.html > > For the g*.dg/gomp/, if you'd only move -Wno-hsa into the last argument > next to -fopenmp, how many tests would be affected? Out of 287 files that have dg-options with them in the gomp directories, only 9 generate hsa warnings: c-c++-common/gomp/clauses-1.c:/* { dg-options "-fopenmp" } */ c-c++-common/gomp/if-1.c:/* { dg-options "-fopenmp" } */ c-c++-common/gomp/pr61486-2.c:/* { dg-options "-fopenmp" } */ c-c++-common/gomp/target-teams-1.c:/* { dg-options "-fopenmp -fdump-tree-gimple" } */ g++.dg/gomp/target-teams-1.C:// { dg-options "-fopenmp -fdump-tree-gimple" } gcc.dg/gomp/pr68128-2.c:/* { dg-options "-O2 -fopenmp -fdump-tree-omplower" } */ gfortran.dg/gomp/target1.f90:! { dg-options "-fopenmp" } gfortran.dg/gomp/target2.f90:! { dg-options "-fopenmp -ffree-line-length-160" } gfortran.dg/gomp/target3.f90:! { dg-options "-fopenmp" } > If not really many, > perhaps those could be changed to use dg-additional-options instead of > dg-options. I do not know what -ffree-line-length-160 is, but probably all of them, even though putting -O2 in gcc.dg/gomp/pr68128-2.c to "additional" flags feels just wrong. However, the real question is: Would such a solution really be much better than the first version of the patch (https://gcc.gnu.org/ml/gcc-patches/2016-02/msg01813.html)? After all, in comparison it would only avoid touching two tests and it will not avoid issues with tests added in future if they use dg-options. Martin
Backport fix of PR 69666 and PR 69920 to gcc-5 branch
Hi, a week has passed with PR 69920 fix in and it seems to have fixed all issues caused by the fix to PR 69666, which I have reverted on the gcc-5 branch. So I am going to un-do that revert and backport the PR 69920 fix in one commit to the branch, after final bootstrap and testing runs finish (actually, it has passed successfully on x86_64-linux, there is one on i686 that is still running). Thanks, Martin 2016-03-03 Martin Jambor PR tree-optimization/69666 PR middle-end/69920 * tree-sra.c (sra_modify_assign): Do not attempt to create default_def replacements for unscalarizable regions. Do not remove loads of uninitialized aggregates to SSA_NAMEs. testsuite/ * gcc.dg/torture/pr69932.c: New test. * gcc.dg/torture/pr69936.c: Likewise. diff --git a/gcc/testsuite/gcc.dg/torture/pr69932.c b/gcc/testsuite/gcc.dg/torture/pr69932.c new file mode 100644 index 000..4b82130 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr69932.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ + +int a; +void fn1() { + int b = 4; + short c[4]; + c[b] = c[a]; + if (c[2]) {} + +} diff --git a/gcc/testsuite/gcc.dg/torture/pr69936.c b/gcc/testsuite/gcc.dg/torture/pr69936.c new file mode 100644 index 000..3023bbb --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr69936.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ + +int a; +char b; +void fn1(int p1) {} + +int fn2() { return 5; } + +void fn3() { + if (fn2()) +; + else { +char c[5]; +c[0] = 5; + lbl_608: +fn1(c[9]); +int d = c[9]; +c[2] | a; +d = c[b]; + } + goto lbl_608; +} + +int main() { return 0; } diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 145a07c..3457aac 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -3242,6 +3242,7 @@ sra_modify_assign (gimple stmt, gimple_stmt_iterator *gsi) } else if (racc && !racc->grp_unscalarized_data + && !racc->grp_unscalarizable_region && TREE_CODE (lhs) == SSA_NAME && !access_has_replacements_p (racc)) { @@ -3405,7 +3406,8 @@ sra_modify_assign (gimple stmt, gimple_stmt_iterator *gsi) else { if (access_has_children_p (racc) - && !racc->grp_unscalarized_data) + && !racc->grp_unscalarized_data + && TREE_CODE (lhs) != SSA_NAME) { if (dump_file) {
[hsa] Consodlidate GTY roots for trees used during expansion to HSA
Hi, when testing the experimental hsa branch, where dynamic parallelism is not disabled and get_hsa_kernel_dispatch_offset is executed quite a bit more frequently, I have come across hsa_kernel_dispatch_type being freed by gcc even though it is marked with a GTY flag. The reason is that the file hsa-gen.c is not listed in GTFILES in Makefile.in (and this it does not have and does not include its gcc header file). This was the only intended GTY root in this file but there is another one in hsa-brig.c for lists of statements to put into a static constructors and destructors. Rather than adding these files to GTFILES, I have decided to move the tree roots to hsa.c which is already there and GTY roots are really only needed for these few rather special occasions. The patch below, which does just that, passed bootstrap and testing and HSA testing on both trunk and the branch. I will commit it in a few moments. Thanks, Martin 2016-03-02 Martin Jambor * hsa.h (hsa_get_ctor_statements): Declare. (hsa_get_dtor_statements): Likewise. (hsa_get_kernel_dispatch_type): Likewise. * hsa.c (hsa_get_ctor_statements): New function. (hsa_get_dtor_statements): Likewise. (hsa_get_kernel_dispatch_type): Likewise. * hsa-brig.c (hsa_cdtor_statements): Removed. (hsa_output_libgomp_mapping): Use hsa_get_ctor_statements and hsa_get_dtor_statements. * hsa-gen.c (hsa_kernel_dispatch_type): Removed. (get_hsa_kernel_dispatch_offset): Use hsa_get_kernel_dispatch_type. --- gcc/hsa-brig.c | 14 ++ gcc/hsa-gen.c | 13 ++--- gcc/hsa.c | 25 + gcc/hsa.h | 3 +++ 4 files changed, 40 insertions(+), 15 deletions(-) diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c index 61cfd8b..2a301be 100644 --- a/gcc/hsa-brig.c +++ b/gcc/hsa-brig.c @@ -2006,8 +2006,6 @@ hsa_brig_emit_omp_symbols (void) emit_directive_variable (hsa_num_threads); } -static GTY(()) tree hsa_cdtor_statements[2]; - /* Create and return __hsa_global_variables symbol that contains all informations consumed by libgomp to link global variables with their string names used by an HSA kernel. */ @@ -2408,6 +2406,7 @@ hsa_output_libgomp_mapping (tree brig_decl) = builtin_decl_explicit (BUILT_IN_GOMP_OFFLOAD_REGISTER); gcc_checking_assert (offload_register); + tree *hsa_ctor_stmts = hsa_get_ctor_statements (); append_to_statement_list (build_call_expr (offload_register, 4, build_int_cstu (unsigned_type_node, @@ -2416,15 +2415,15 @@ hsa_output_libgomp_mapping (tree brig_decl) build_fold_addr_expr (hsa_libgomp_host_table), build_int_cst (integer_type_node, GOMP_DEVICE_HSA), build_fold_addr_expr (hsa_img_descriptor)), - &hsa_cdtor_statements[0]); + hsa_ctor_stmts); - cgraph_build_static_cdtor ('I', hsa_cdtor_statements[0], -DEFAULT_INIT_PRIORITY); + cgraph_build_static_cdtor ('I', *hsa_ctor_stmts, DEFAULT_INIT_PRIORITY); tree offload_unregister = builtin_decl_explicit (BUILT_IN_GOMP_OFFLOAD_UNREGISTER); gcc_checking_assert (offload_unregister); + tree *hsa_dtor_stmts = hsa_get_dtor_statements (); append_to_statement_list (build_call_expr (offload_unregister, 4, build_int_cstu (unsigned_type_node, @@ -2433,9 +2432,8 @@ hsa_output_libgomp_mapping (tree brig_decl) build_fold_addr_expr (hsa_libgomp_host_table), build_int_cst (integer_type_node, GOMP_DEVICE_HSA), build_fold_addr_expr (hsa_img_descriptor)), - &hsa_cdtor_statements[1]); - cgraph_build_static_cdtor ('D', hsa_cdtor_statements[1], -DEFAULT_INIT_PRIORITY); + hsa_dtor_stmts); + cgraph_build_static_cdtor ('D', *hsa_dtor_stmts, DEFAULT_INIT_PRIORITY); } /* Emit the brig module we have compiled to a section in the final assembly and diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c index d7d39f0..fc59fa5 100644 --- a/gcc/hsa-gen.c +++ b/gcc/hsa-gen.c @@ -3772,20 +3772,19 @@ gen_set_num_threads (tree value, hsa_bb *hbb) hbb->append_insn (basic); } -static GTY (()) tree hsa_kernel_dispatch_type = NULL; - /* Return byte offset of a FIELD_NAME in GOMP_hsa_kernel_dispatch which is defined in plugin-hsa.c. */ static HOST_WIDE_INT get_hsa_kernel_dispatch_offset (const char *field_name) { - if (hsa_kernel_dispatch_type == NULL) + tree *hsa_kernel_dispatch_type = hsa_get_kernel_dispatch_type (); + if (*hsa_kernel_dispatch_type == NULL) { /* Collection of information needed for a dispatch of a kernel from a kernel. Keep in sync with libgomp's plugin-hsa.c. */ - hsa_kernel_dispatch_type = make_node (RECORD_TYPE); + *hsa_kernel_dispatch_type = make_node (RECORD_TYPE);
[hsa testsuite 0/5] Re-post of all pending patches adjusting testsuite for HSA
Hi, in order to consolidate things, I have decided to re-post all "hsa testsuite" patches under this thread. With the patches applied, we do no not get any spurious failures because of hsa warnings or libgomp testcases failing because they are run on the host fallback. Moreover, the first patch adds a simple dump-scan compile-time gridification tests and the last patch adds a special directory for run-time C tests of hsa which are run only when HSA devices are actually selected for offloading. In the future, I'll likely propose similar C++ and Fortran directories. All patches were tested by running the whole testsuite on patched trunk: - that was configured for all languages except go but not configured for HSA, - that was configured for all languages except go and also for HSA offloading, but an HSA device was not present on the machine, and - running the whole suite after configuring trunk for C, C++ and Fortran on a computer with an HSA APU, and subsequently comparing generated .sum files with unpatched trunk. Thanks for any feedback (and approvals ;-), Martin
[hsa testsuite 3/5] Suppress hsa warnings in libgomp tests
Hi, just like with the compiler gomp testsuite, we need to add -Wno-hsa to options when compiling libgomp testcases in order not to have "excess errors" failures when HSA is enabled. There are quite many of such testcases on the trunk because I have disabled the dynamic parallelism way of executing stuff. The patch below adds the option to all libgomp testsuite compilations, so that people who are not interested in HSA do not need to care. The patch has been tested both with and without HSA enabled. OK for trunk? Thanks, Martin 2016-03-04 Martin Jambor * testsuite/lib/libgomp.exp (libgomp_init): Append -Wno-hsa to ALWAYS_CFLAGS. --- libgomp/testsuite/lib/libgomp.exp | 3 +++ 1 file changed, 3 insertions(+) diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp index 154a447..bbc2c26 100644 --- a/libgomp/testsuite/lib/libgomp.exp +++ b/libgomp/testsuite/lib/libgomp.exp @@ -237,6 +237,9 @@ proc libgomp_init { args } { # Disable caret lappend ALWAYS_CFLAGS "additional_flags=-fno-diagnostics-show-caret" +# Disable HSA warnings by default. +lappend ALWAYS_CFLAGS "additional_flags=-Wno-hsa" + # Disable color diagnostics lappend ALWAYS_CFLAGS "additional_flags=-fdiagnostics-color=never" -- 2.7.1
[hsa testsuite 1/5] Gridification tests
Hi, the patch below adds a DejaGNU effective target predicate (is that the correct dejagnu term?) offload_hsa so that selected tests can be run only if the hsa offloading is enabled. I hope it is fairly standard stuff. Additionally, it adds one C/C++ and one Fortran testsuite to check that gridification happens. Tested, both with and without HSA enabled. OK for trunk? Thanks, Martin 2016-02-10 Martin Jambor * target-supports.exp (check_effective_target_offload_hsa): New. * c-c++-common/gomp/gridify-1.c: New test. * gfortran.dg/gomp/gridify-1.f90: Likewise. --- gcc/testsuite/c-c++-common/gomp/gridify-1.c | 54 gcc/testsuite/gfortran.dg/gomp/gridify-1.f90 | 16 + gcc/testsuite/lib/target-supports.exp| 8 + 3 files changed, 78 insertions(+) create mode 100644 gcc/testsuite/c-c++-common/gomp/gridify-1.c create mode 100644 gcc/testsuite/gfortran.dg/gomp/gridify-1.f90 diff --git a/gcc/testsuite/c-c++-common/gomp/gridify-1.c b/gcc/testsuite/c-c++-common/gomp/gridify-1.c new file mode 100644 index 000..ba7a866 --- /dev/null +++ b/gcc/testsuite/c-c++-common/gomp/gridify-1.c @@ -0,0 +1,54 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target offload_hsa } */ +/* { dg-options "-fopenmp -fdump-tree-omplower-details" } */ + +void +foo1 (int n, int *a, int workgroup_size) +{ + int i; +#pragma omp target +#pragma omp teams thread_limit(workgroup_size) +#pragma omp distribute parallel for shared(a) firstprivate(n) private(i) +for (i = 0; i < n; i++) + a[i]++; +} + +void +foo2 (int j, int n, int *a) +{ + int i; +#pragma omp target teams +#pragma omp distribute parallel for shared(a) firstprivate(n) private(i) firstprivate(j) +for (i = j + 1; i < n; i++) + a[i] = i; +} + +void +foo3 (int j, int n, int *a) +{ + int i; +#pragma omp target teams +#pragma omp distribute parallel for shared(a) firstprivate(n) private(i) firstprivate(j) + for (i = j + 1; i < n; i += 3) +a[i] = i; +} + +void +foo4 (int j, int n, int *a) +{ +#pragma omp parallel + { +#pragma omp single +{ + int i; +#pragma omp target +#pragma omp teams +#pragma omp distribute parallel for shared(a) firstprivate(n) private(i) firstprivate(j) + for (i = j + 1; i < n; i += 3) + a[i] = i; +} + } +} + + +/* { dg-final { scan-tree-dump-times "Target construct will be turned into a gridified GPGPU kernel" 4 "omplower" } } */ diff --git a/gcc/testsuite/gfortran.dg/gomp/gridify-1.f90 b/gcc/testsuite/gfortran.dg/gomp/gridify-1.f90 new file mode 100644 index 000..00ff7f5 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/gridify-1.f90 @@ -0,0 +1,16 @@ +! { dg-do compile } +! { dg-require-effective-target offload_hsa } +! { dg-options "-fopenmp -fdump-tree-omplower-details" } */ + +subroutine vector_square(n, a, b) + integer i, n, b(n), a(n) +!$omp target teams +!$omp distribute parallel do + do i=1,n + b(i) = a(i) * a(i) + enddo +!$omp end distribute parallel do +!$omp end target teams +end subroutine vector_square + +! { dg-final { scan-tree-dump "Target construct will be turned into a gridified GPGPU kernel" "omplower" } } diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 0b4252f..fac4c3c 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -6936,3 +6936,11 @@ proc check_effective_target_offload_nvptx { } { int main () {return 0;} } "-foffload=nvptx-none" ] } + +# Return 1 if the compiler has been configured with hsa offloading. + +proc check_effective_target_offload_hsa { } { +return [check_no_compiler_messages offload_hsa assembly { + int main () {return 0;} +} "-foffload=hsa" ] +} -- 2.7.1
[hsa testsuite 5/5] New directory for HSA-specific C testcases
Hi, we would like a place to have some HSA-specific tests, which would only run not only when HSA is enabled at configuration time but also when HSA hardware is present and used for offloading. I have proposed the first version of this patch as https://gcc.gnu.org/ml/gcc-patches/2016-02/msg01817.html and got some seedback from Mike Stump in https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00086.html. I hope I have incorporated his suggestions. As I wrote in the cover letter, it is likely I'll propose similar C++ and Fortran directories in the future. Is the patch OK for trunk? Thanks, Martin 2016-03-03 Martin Jambor * testsuite/lib/libgomp.exp (check_effective_target_hsa_offloading_selected_nocache): New. (check_effective_target_hsa_offloading_selected): Likewise. * testsuite/libgomp.hsa.c/c.exp: Likewise. * testsuite/libgomp.hsa.c/alloca-1.c: Likewise. * testsuite/libgomp.hsa.c/bitfield-1.c: Likewise. * testsuite/libgomp.hsa.c/builtins-1.c: Likewise. * testsuite/libgomp.hsa.c/complex-1.c: Likewise. * testsuite/libgomp.hsa.c/formal-actual-args-1.c: Likewise. * testsuite/libgomp.hsa.c/function-call-1.c: Likewise. * testsuite/libgomp.hsa.c/get-level-1.c: Likewise. * testsuite/libgomp.hsa.c/gridify-1.c: Likewise. * testsuite/libgomp.hsa.c/gridify-2.c: Likewise. * testsuite/libgomp.hsa.c/gridify-3.c: Likewise. * testsuite/libgomp.hsa.c/gridify-4.c: Likewise. * testsuite/libgomp.hsa.c/memory-operations-1.c: Likewise. * testsuite/libgomp.hsa.c/pr69568.c: Likewise. * testsuite/libgomp.hsa.c/rotate-1.c: Likewise. * testsuite/libgomp.hsa.c/switch-1.c: Likewise. * testsuite/libgomp.hsa.c/switch-branch-1.c: Likewise. --- libgomp/testsuite/lib/libgomp.exp | 53 +++ libgomp/testsuite/libgomp.hsa.c/alloca-1.c | 25 libgomp/testsuite/libgomp.hsa.c/bitfield-1.c | 160 + libgomp/testsuite/libgomp.hsa.c/builtins-1.c | 97 + libgomp/testsuite/libgomp.hsa.c/c.exp | 42 ++ libgomp/testsuite/libgomp.hsa.c/complex-1.c| 65 + .../testsuite/libgomp.hsa.c/formal-actual-args-1.c | 83 +++ libgomp/testsuite/libgomp.hsa.c/function-call-1.c | 50 +++ libgomp/testsuite/libgomp.hsa.c/get-level-1.c | 26 libgomp/testsuite/libgomp.hsa.c/gridify-1.c| 26 libgomp/testsuite/libgomp.hsa.c/gridify-2.c| 26 libgomp/testsuite/libgomp.hsa.c/gridify-3.c| 39 + libgomp/testsuite/libgomp.hsa.c/gridify-4.c| 45 ++ .../testsuite/libgomp.hsa.c/memory-operations-1.c | 92 libgomp/testsuite/libgomp.hsa.c/pr69568.c | 41 ++ libgomp/testsuite/libgomp.hsa.c/rotate-1.c | 39 + libgomp/testsuite/libgomp.hsa.c/switch-1.c | 145 +++ libgomp/testsuite/libgomp.hsa.c/switch-branch-1.c | 116 +++ 18 files changed, 1170 insertions(+) create mode 100644 libgomp/testsuite/libgomp.hsa.c/alloca-1.c create mode 100644 libgomp/testsuite/libgomp.hsa.c/bitfield-1.c create mode 100644 libgomp/testsuite/libgomp.hsa.c/builtins-1.c create mode 100644 libgomp/testsuite/libgomp.hsa.c/c.exp create mode 100644 libgomp/testsuite/libgomp.hsa.c/complex-1.c create mode 100644 libgomp/testsuite/libgomp.hsa.c/formal-actual-args-1.c create mode 100644 libgomp/testsuite/libgomp.hsa.c/function-call-1.c create mode 100644 libgomp/testsuite/libgomp.hsa.c/get-level-1.c create mode 100644 libgomp/testsuite/libgomp.hsa.c/gridify-1.c create mode 100644 libgomp/testsuite/libgomp.hsa.c/gridify-2.c create mode 100644 libgomp/testsuite/libgomp.hsa.c/gridify-3.c create mode 100644 libgomp/testsuite/libgomp.hsa.c/gridify-4.c create mode 100644 libgomp/testsuite/libgomp.hsa.c/memory-operations-1.c create mode 100644 libgomp/testsuite/libgomp.hsa.c/pr69568.c create mode 100644 libgomp/testsuite/libgomp.hsa.c/rotate-1.c create mode 100644 libgomp/testsuite/libgomp.hsa.c/switch-1.c create mode 100644 libgomp/testsuite/libgomp.hsa.c/switch-branch-1.c diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp index bbc2c26..0d5b6d4 100644 --- a/libgomp/testsuite/lib/libgomp.exp +++ b/libgomp/testsuite/lib/libgomp.exp @@ -395,3 +395,56 @@ proc check_effective_target_openacc_host_selected { } { } return 0; } + +# Return 1 if the selected OMP device is actually a HSA device + +proc check_effective_target_hsa_offloading_selected_nocache {} { +global tool + +set src { + int main () { + int v = 1; + #pragma omp target map(from:v) + v = 0; + return v; + } +} + +set result [eval [list check_compile hsa_offloading_src executable $src] ""] +set lines [lindex $result 0] +set output [lindex $result 1] + +set ok 0 +if { [string
[hsa testsuite 4/5] Adjust libgomp tests that do not work on host fallback
Hi, this patch avoids run-time failures in libgomp testsuite that curtrently happen when HSA offloading is actually used. All of these tests require the offload_device effective target which the patch changes to offload_device_nonshared_as one. For some tests, such as libgomp.c/examples-4/device-1.c this is clearly just the correct thing to do because the test explicitely checks that changes that happen in a target construct and are not "mapped" back are not observable on the host. However, the majority of the tests has a different problem. If a test for some reason is not compiled into HSAIL (usually because it would require the dynamic parallelism path which is disabled or because it calls abort from within target which HSA so far cannot handle), the host fallback is called, even though the test actually is not supposed to be called on it. Such problematic tests then call omp_is_initial_device to verify they are not running on the host and decide to fail when they figure out they are. Changing the effective target only to devices with non-shared memory probably isn't the really correct fix. We basically want to disable the host fallback for them regardeless of address spaces but I cannot think of a simple and generic way of doing that. However, all testcases for non-shared memory devices were written with disallowed fallback in mind and so this soulution also gives the desired result. Perhaps we need something better for the long term, any suggestions are welcome. Tested both with and without HSA (enabled or present). OK for trunk? Thanks, Martin 2016-02-12 Martin Jambor libgomp/ * testsuite/libgomp.c/examples-4/async_target-2.c: Only run on non-shared memory accelerators. * testsuite/libgomp.c/examples-4/device-1.c: Likewise. * testsuite/libgomp.c/examples-4/target-5.c: Likewise. * testsuite/libgomp.c/examples-4/target_data-6.c: Likewise. * testsuite/libgomp.c/examples-4/target_data-7.c: Likewise. * testsuite/libgomp.fortran/examples-4/async_target-2.f90: Likewise. * testsuite/libgomp.fortran/examples-4/device-1.f90: Likewise. * testsuite/libgomp.fortran/examples-4/target-5.f90: Likewise. * testsuite/libgomp.fortran/examples-4/target_data-6.f90: Likewise. * testsuite/libgomp.fortran/examples-4/target_data-7.f90: Likewise. --- libgomp/testsuite/libgomp.c/examples-4/async_target-2.c | 2 +- libgomp/testsuite/libgomp.c/examples-4/device-1.c | 2 +- libgomp/testsuite/libgomp.c/examples-4/target-5.c | 2 +- libgomp/testsuite/libgomp.c/examples-4/target_data-6.c | 2 +- libgomp/testsuite/libgomp.c/examples-4/target_data-7.c | 2 +- libgomp/testsuite/libgomp.fortran/examples-4/async_target-2.f90 | 2 +- libgomp/testsuite/libgomp.fortran/examples-4/device-1.f90 | 2 +- libgomp/testsuite/libgomp.fortran/examples-4/target-5.f90 | 2 +- libgomp/testsuite/libgomp.fortran/examples-4/target_data-6.f90 | 2 +- libgomp/testsuite/libgomp.fortran/examples-4/target_data-7.f90 | 2 +- 10 files changed, 10 insertions(+), 10 deletions(-) diff --git a/libgomp/testsuite/libgomp.c/examples-4/async_target-2.c b/libgomp/testsuite/libgomp.c/examples-4/async_target-2.c index ce63328..0c76f8e 100644 --- a/libgomp/testsuite/libgomp.c/examples-4/async_target-2.c +++ b/libgomp/testsuite/libgomp.c/examples-4/async_target-2.c @@ -1,5 +1,5 @@ /* { dg-do run } */ -/* { dg-require-effective-target offload_device } */ +/* { dg-require-effective-target offload_device_nonshared_as } */ #include #include diff --git a/libgomp/testsuite/libgomp.c/examples-4/device-1.c b/libgomp/testsuite/libgomp.c/examples-4/device-1.c index dad8572..46aa160 100644 --- a/libgomp/testsuite/libgomp.c/examples-4/device-1.c +++ b/libgomp/testsuite/libgomp.c/examples-4/device-1.c @@ -1,5 +1,5 @@ /* { dg-do run } */ -/* { dg-require-effective-target offload_device } */ +/* { dg-require-effective-target offload_device_nonshared_as } */ #include #include diff --git a/libgomp/testsuite/libgomp.c/examples-4/target-5.c b/libgomp/testsuite/libgomp.c/examples-4/target-5.c index 1853fba..1c14bae 100644 --- a/libgomp/testsuite/libgomp.c/examples-4/target-5.c +++ b/libgomp/testsuite/libgomp.c/examples-4/target-5.c @@ -1,5 +1,5 @@ /* { dg-do run } */ -/* { dg-require-effective-target offload_device } */ +/* { dg-require-effective-target offload_device_nonshared_as } */ #include #include diff --git a/libgomp/testsuite/libgomp.c/examples-4/target_data-6.c b/libgomp/testsuite/libgomp.c/examples-4/target_data-6.c index affeb49..57c7c0c 100644 --- a/libgomp/testsuite/libgomp.c/examples-4/target_data-6.c +++ b/libgomp/testsuite/libgomp.c/examples-4/target_data-6.c @@ -1,5 +1,5 @@ /* { dg-do run } */ -/* { dg-require-effective-target offload_device } */ +/* { dg-require-effective-target offload_device_nonshared_as } */ #include #include diff --git a/libgomp/test
[hsa testsuite 2/5] Suppress hsa warnings in compiler gomp tests
Hi, as Jakub requested, this patch deals with HSA "excess errors" in the gomp compiler testsuite by passing -Wno-hsa to all of them. After discussing this in the thread about similar libgomp tests[1] (which are however handled differently), Jakub expressed preference for passing the option in default_extra_flags rather than flags so that names of the tests do not change. This however requires that the failing tests which use dg-options must be adjusted. There is 9 of them, most of them have just superfluous -fopenmp in them which can be removed because that is the default and the rest is handled by turning dg-options into dg-additional-options. OK for trunk? Thanks, Martin [1] https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00381.html 2016-03-04 Martin Jambor * c-c++-common/gomp/clauses-1.c: Remove dg-options. * c-c++-common/gomp/if-1.c: Likewise. * c-c++-common/gomp/pr61486-2.c: Likewise. * c-c++-common/gomp/target-teams-1.c: Moved dg-options except -fopenmp to dg-additional-options. * g++.dg/gomp/gomp.exp: Pass -Wno-hsa to all tests. * g++/gomp/target-teams-1.c: Likewise. * gcc.dg/gomp/gomp.exp: Likewise. * gcc.dg/gomp/pr68128-2.c: Moved dg-options except -fopenmp to dg-additional-options. * gfortran.dg/gomp/gomp.exp: Likewise. * gfortran.dg/gomp/target1.f90: Remove dg-options. * gfortran.dg/gomp/target2.f90: Moved dg-options except -fopenmp to dg-additional-options. * gfortran.dg/gomp/target3.f90: Remove dg-options. --- gcc/testsuite/c-c++-common/gomp/clauses-1.c | 1 - gcc/testsuite/c-c++-common/gomp/if-1.c | 1 - gcc/testsuite/c-c++-common/gomp/pr61486-2.c | 1 - gcc/testsuite/c-c++-common/gomp/target-teams-1.c | 2 +- gcc/testsuite/g++.dg/gomp/gomp.exp | 2 +- gcc/testsuite/g++.dg/gomp/target-teams-1.C | 2 +- gcc/testsuite/gcc.dg/gomp/gomp.exp | 2 +- gcc/testsuite/gcc.dg/gomp/pr68128-2.c| 2 +- gcc/testsuite/gfortran.dg/gomp/gomp.exp | 2 +- gcc/testsuite/gfortran.dg/gomp/target1.f90 | 1 - gcc/testsuite/gfortran.dg/gomp/target2.f90 | 2 +- gcc/testsuite/gfortran.dg/gomp/target3.f90 | 1 - 12 files changed, 7 insertions(+), 12 deletions(-) diff --git a/gcc/testsuite/c-c++-common/gomp/clauses-1.c b/gcc/testsuite/c-c++-common/gomp/clauses-1.c index 2d1c352..91aed39 100644 --- a/gcc/testsuite/c-c++-common/gomp/clauses-1.c +++ b/gcc/testsuite/c-c++-common/gomp/clauses-1.c @@ -1,5 +1,4 @@ /* { dg-do compile } */ -/* { dg-options "-fopenmp" } */ /* { dg-additional-options "-std=c99" { target c } } */ int t; diff --git a/gcc/testsuite/c-c++-common/gomp/if-1.c b/gcc/testsuite/c-c++-common/gomp/if-1.c index 4ba708c..3a9b538 100644 --- a/gcc/testsuite/c-c++-common/gomp/if-1.c +++ b/gcc/testsuite/c-c++-common/gomp/if-1.c @@ -1,5 +1,4 @@ /* { dg-do compile } */ -/* { dg-options "-fopenmp" } */ void foo (int a, int b, int *p, int *q) diff --git a/gcc/testsuite/c-c++-common/gomp/pr61486-2.c b/gcc/testsuite/c-c++-common/gomp/pr61486-2.c index db97143..4a68023 100644 --- a/gcc/testsuite/c-c++-common/gomp/pr61486-2.c +++ b/gcc/testsuite/c-c++-common/gomp/pr61486-2.c @@ -1,6 +1,5 @@ /* PR middle-end/61486 */ /* { dg-do compile } */ -/* { dg-options "-fopenmp" } */ /* { dg-require-effective-target alloca } */ #pragma omp declare target diff --git a/gcc/testsuite/c-c++-common/gomp/target-teams-1.c b/gcc/testsuite/c-c++-common/gomp/target-teams-1.c index 0a707c2..51b8d48 100644 --- a/gcc/testsuite/c-c++-common/gomp/target-teams-1.c +++ b/gcc/testsuite/c-c++-common/gomp/target-teams-1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-fopenmp -fdump-tree-gimple" } */ +/* { dg-additional-options "-fdump-tree-gimple" } */ int v = 6; void bar (int); diff --git a/gcc/testsuite/g++.dg/gomp/gomp.exp b/gcc/testsuite/g++.dg/gomp/gomp.exp index 7365389..d26596c 100644 --- a/gcc/testsuite/g++.dg/gomp/gomp.exp +++ b/gcc/testsuite/g++.dg/gomp/gomp.exp @@ -29,7 +29,7 @@ dg-init # Main loop. g++-dg-runtest [lsort [concat \ [find $srcdir/$subdir *.C] \ - [find $srcdir/c-c++-common/gomp *.c]]] "" "-fopenmp" + [find $srcdir/c-c++-common/gomp *.c]]] "" "-fopenmp -Wno-hsa" # All done. dg-finish diff --git a/gcc/testsuite/g++.dg/gomp/target-teams-1.C b/gcc/testsuite/g++.dg/gomp/target-teams-1.C index 0a97de0..f78a608 100644 --- a/gcc/testsuite/g++.dg/gomp/target-teams-1.C +++ b/gcc/testsuite/g++.dg/gomp/target-teams-1.C @@ -1,5 +1,5 @@ // { dg-do compile } -// { dg-options "-fopenmp -fdump-tree-gimple" } +// { dg-additional-options "-fdump-tree-gimple" } int v = 6; void bar (int); diff --git a/gcc/testsuite/gcc.dg/gomp/gomp.exp b/gcc/testsuite/gcc.dg/gomp/gomp.exp index 78623fc..b6b5932 100644 --- a/gcc/testsuit
Re: [RFC][PR69708] IPA inline not working for function reference in static const struc
rough jump functions, as shown by xfailing ipcp-cstagg-7.c testcase. To fix that, we'd either have to force propagation of aggregate values from constant globals even through jump functions that have agg_preserved flag cleared, or, and I think this is perhaps a better idea, rethink the whole approach, give up creating aggregate jump functions and instead use normal scalar propagation (even for non-scalar types, if they are exact copies of a read-only aggregate) and change the consumers so that they use the static initializers to look up the value. This would also have the added advantage that parameter PARAM_IPA_MAX_AGG_ITEMS would not be an issue. So he current effort below is basically only for reference, hopefully we'll be able to implement the second approach at some point during stage1. Thanks for raising this issue, Martin 2016-03-10 Martin Jambor * ipa-prop.c (count_constants_in_agg_constructor): New function. (build_agg_jump_func_from_constructor): Likewise. (determine_locally_known_aggregate_parts): Use thwem to process global constant variables. (parm_preserved_before_stmt_p): Return true for loads from TREE_READONLY parameters. testsuite/ * gcc.dg/ipa/ipcp-cstagg-1.c: New test. * gcc.dg/ipa/ipcp-cstagg-2.c: Likewise. * gcc.dg/ipa/ipcp-cstagg-3.c: Likewise. * gcc.dg/ipa/ipcp-cstagg-4.c: Likewise. * gcc.dg/ipa/ipcp-cstagg-5.c: Likewise. * gcc.dg/ipa/ipcp-cstagg-6.c: Likewise. * gcc.dg/ipa/ipcp-cstagg-7.c: Likewise. --- gcc/ipa-prop.c | 141 +++ gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-1.c | 32 +++ gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-2.c | 39 + gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-3.c | 37 gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-4.c | 39 + gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-5.c | 59 + gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-6.c | 81 ++ gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-7.c | 46 ++ 8 files changed, 474 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-1.c create mode 100644 gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-2.c create mode 100644 gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-3.c create mode 100644 gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-4.c create mode 100644 gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-5.c create mode 100644 gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-6.c create mode 100644 gcc/testsuite/gcc.dg/ipa/ipcp-cstagg-7.c diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c index d62c704..e0bc307 100644 --- a/gcc/ipa-prop.c +++ b/gcc/ipa-prop.c @@ -803,6 +803,11 @@ parm_preserved_before_stmt_p (struct ipa_func_body_info *fbi, int index, bool modified = false; ao_ref refd; + tree base = get_base_address (parm_load); + gcc_assert (TREE_CODE (base) == PARM_DECL); + if (TREE_READONLY (base)) +return true; + /* FIXME: FBI can be NULL if we are being called from outside ipa_node_analysis or ipcp_transform_function, which currently happens during inlining analysis. It would be great to extend fbi's lifetime and @@ -1395,6 +1400,121 @@ build_agg_jump_func_from_list (struct ipa_known_agg_contents_list *list, } } +/* Return how many interprocedural scalar invariants there are in a static + CONSTRUCTOR of a variable. */ + +static unsigned +count_constants_in_agg_constructor (tree constructor) +{ + unsigned res = 0, max = PARAM_VALUE (PARAM_IPA_MAX_AGG_ITEMS); + unsigned ix; + tree index, val; + FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (constructor), ix, index, val) +{ + if (TREE_CODE (TREE_TYPE (constructor)) == RECORD_TYPE + && !index) + /* We cannot handle field elements that do not have the field decl as + its index. */ + continue; + if (is_gimple_reg_type (TREE_TYPE (val)) + && is_gimple_ip_invariant (val)) + res++; + else if (TREE_CODE (val) == CONSTRUCTOR) + res += count_constants_in_agg_constructor (val); + + if (res > max) + return max; +} + return res; +} + +/* Push invariants from static constructor of a global variable into JFUNC's + aggregate jump function. BASE_OFFSET is the offset which should be added to + offset of each value. It can be negative to represent that only a part of + an aggregate starting at BASE_OFFSET is being passed as an actual + argument. */ + +static void +build_agg_jump_func_from_constructor (tree constructor, + HOST_WIDE_INT base_offset, + struct ipa_jump_func *jfunc) +{ + tree type = TREE_TYPE (constructor); + if (TREE_CODE (type) != ARRAY_TYPE + && TREE_CODE (type) != RECORD_TYPE) +return; + + unsigned ix; + tree index, val; + FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (constructor), ix, index, val) +{ + if (!jfunc->agg.items->space (1)) +
[omp] Create openmp -fopt-info optimization group
Hi, the following patch does two things. First, it creates a new optinfo group for OpenMP and moves OpenMP lowering and expansion to this group. Second, it changes all gridification MSG_NOTE dumps to MSG_MISSED_OPTIMIZATION, which is more appropriate. (Apparently, I remembered to change the dump about performed gridification to MSG_OPTIMIZED_LOCATIONS last autumn but failed to do it for dumps with failure reasons). With these changes, users that configured their compiler with HSA can use (for example) the -fopt-info-all-openmp option to get information about which target constructs have been gridified and which were not: mjambor@virgil:~/gcc/hsa/tests/grid$ ~/gcc/hsa/inst/bin/gcc -fopenmp -O combined-hsa.c -fopt-info-all-openmp combined-hsa.c:9:9: note: Target construct will be turned into a gridified GPGPU kernel or /home/mjambor/gcc/hsa/src/libgomp/testsuite/libgomp.c/examples-4/target_data-3.c:50:10: note: Will not turn target construct into a simple GPGPU kernel because it does not have a sole teams construct in it. and so forth. I have bootstrapped and tested the patch on x86_64-linux (with and without configured HSA) and by running make info and examining the generated info files. Since it is only a dumping change, I'd like to propose it for trunk even at this late stage. If release managers however do not think it is desirable, I'll commit it to the hsa branch and propose to trunk again once stage1 opens. Thanks, Martin 2016-03-14 Martin Jambor * doc/invoke.texi (-fopt-info): Document openmp optimization group. * doc/optinfo.texi (Optimization groups): Document OPTGROUP_OPENMP. * dumpfile.c (optgroup_options): Add entry for OpenMP optimizations. * dumpfile.h (OPTGROUP_OPENMP): New define. * omp-low.c (pass_data_expand_omp): Change optinfo_flags to OPTGROUP_OPENMP. (pass_data_expand_omp_ssa): Likewise. (pass_data_lower_omp): Likewise. (pass_data_omp_simd_clone): Likewise. (grid_find_single_omp_among_assignments_1): Changed all occurrences of MSG_NOTE to MSG_MISSED_OPTIMIZATION. (grid_find_single_omp_among_assignments): Likewise. (grid_target_follows_gridifiable_pattern): Likewise. --- gcc/doc/invoke.texi | 2 ++ gcc/doc/optinfo.texi | 3 +++ gcc/dumpfile.c | 1 + gcc/dumpfile.h | 3 ++- gcc/omp-low.c| 56 ++-- 5 files changed, 36 insertions(+), 29 deletions(-) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 99ac11b..5c798a4 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -12194,6 +12194,8 @@ Enable dumps from all interprocedural optimizations. Enable dumps from all loop optimizations. @item inline Enable dumps from all inlining optimizations. +@item openmp +Enable dumps from OpenMP optimizations. @item vec Enable dumps from all vectorization optimizations. @item optall diff --git a/gcc/doc/optinfo.texi b/gcc/doc/optinfo.texi index 3c8fdba..20ca560 100644 --- a/gcc/doc/optinfo.texi +++ b/gcc/doc/optinfo.texi @@ -59,6 +59,9 @@ Loop optimization passes. Enabled by @option{-loop}. @item OPTGROUP_INLINE Inlining passes. Enabled by @option{-inline}. +@item OPTGROUP_OPENMP +OpenMP passes. Enabled by @option{-openmp}. + @item OPTGROUP_VEC Vectorization passes. Enabled by @option{-vec}. diff --git a/gcc/dumpfile.c b/gcc/dumpfile.c index 144e371..f2430f3 100644 --- a/gcc/dumpfile.c +++ b/gcc/dumpfile.c @@ -136,6 +136,7 @@ static const struct dump_option_value_info optgroup_options[] = {"ipa", OPTGROUP_IPA}, {"loop", OPTGROUP_LOOP}, {"inline", OPTGROUP_INLINE}, + {"openmp", OPTGROUP_OPENMP}, {"vec", OPTGROUP_VEC}, {"optall", OPTGROUP_ALL}, {NULL, 0} diff --git a/gcc/dumpfile.h b/gcc/dumpfile.h index c168cbf..72f696b 100644 --- a/gcc/dumpfile.h +++ b/gcc/dumpfile.h @@ -97,7 +97,8 @@ enum tree_dump_index #define OPTGROUP_LOOP(1 << 2) /* Loop optimization passes */ #define OPTGROUP_INLINE (1 << 3) /* Inlining passes */ #define OPTGROUP_VEC (1 << 4) /* Vectorization passes */ -#define OPTGROUP_OTHER (1 << 5) /* All other passes */ +#define OPTGROUP_OPENMP (1 << 5) /* OpenMP specific transformations */ +#define OPTGROUP_OTHER (1 << 6) /* All other passes */ #define OPTGROUP_ALL(OPTGROUP_IPA | OPTGROUP_LOOP | OPTGROUP_INLINE \ | OPTGROUP_VEC | OPTGROUP_OTHER) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 82dec9d..6f42717 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -13990,7 +13990,7 @@ const pass_data pass_data_expand_omp = { GIMPLE_PASS, /* type */ "ompexp", /* name */ - OPTGROUP_NONE, /* optinfo_flags */ + OPTGROUP_OPENMP, /* optinfo_flags */ TV_NONE, /* tv_id */ PROP_gimple_any, /* properties_require
[hsa branch] Use an obstack instead of multiple alloc pools
Hi, when I started working on expansion to HSAIL almost three years ago, I decided to allocate memory for most of the structures from various alloc-pools for reasons that never materialized and the number of pools later grew to unreasonable numbers. So after an internal discussion, Martin Liska wrote the following patch which changes allocations from the various hsa alloc pools to allocations from one obstack. I have just committed the patch to the hsa branch after testing it. Thanks, Martin 2016-03-17 Martin Liska Martin Jambor * hsa-gen.c (hsa_allocp_operand_address): Removed. (hsa_allocp_operand_immed): Likewise. (hsa_allocp_operand_reg): Likewise. (hsa_allocp_operand_code_list): Likewise. (hsa_allocp_operand_operand_list): Likewise. (hsa_allocp_inst_basic): Likewise. (hsa_allocp_inst_phi): Likewise. (hsa_allocp_inst_mem): Likewise. (hsa_allocp_inst_atomic): Likewise. (hsa_allocp_inst_signal): Likewise. (hsa_allocp_inst_seg): Likewise. (hsa_allocp_inst_cmp): Likewise. (hsa_allocp_inst_br): Likewise. (hsa_allocp_inst_sbr): Likewise. (hsa_allocp_inst_call): Likewise. (hsa_allocp_inst_arg_block): Likewise. (hsa_allocp_inst_comment): Likewise. (hsa_allocp_inst_queue): Likewise. (hsa_allocp_inst_srctype): Likewise. (hsa_allocp_inst_packed): Likewise. (hsa_allocp_inst_cvt): Likewise. (hsa_allocp_inst_alloca): Likewise. (hsa_allocp_bb): Likewise. (hsa_obstack): New. (hsa_init_data_for_cfun): Initialize obstack. (hsa_deinit_data_for_cfun): Release memory of the obstack. (hsa_op_immed::operator new): Use obstack instead of object_allocator. (hsa_op_reg::operator new): Likewise. (hsa_op_address::operator new): Likewise. (hsa_op_code_list::operator new): Likewise. (hsa_op_operand_list::operator new): Likewise. (hsa_insn_basic::operator new): Likewise. (hsa_insn_phi::operator new): Likewise. (hsa_insn_br::operator new): Likewise. (hsa_insn_sbr::operator new): Likewise. (hsa_insn_cmp::operator new): Likewise. (hsa_insn_mem::operator new): Likewise. (hsa_insn_atomic::operator new): Likewise. (hsa_insn_signal::operator new): Likewise. (hsa_insn_seg::operator new): Likewise. (hsa_insn_call::operator new): Likewise. (hsa_insn_arg_block::operator new): Likewise. (hsa_insn_comment::operator new): Likewise. (hsa_insn_srctype::operator new): Likewise. (hsa_insn_packed::operator new): Likewise. (hsa_insn_cvt::operator new): Likewise. (hsa_insn_alloca::operator new): Likewise. (hsa_init_new_bb): Likewise. --- gcc/hsa-gen.c | 227 ++ 1 file changed, 68 insertions(+), 159 deletions(-) diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c index f66eb53..36bc52d 100644 --- a/gcc/hsa-gen.c +++ b/gcc/hsa-gen.c @@ -38,7 +38,6 @@ along with GCC; see the file COPYING3. If not see #include "dumpfile.h" #include "gimple-pretty-print.h" #include "diagnostic-core.h" -#include "alloc-pool.h" #include "gimple-ssa.h" #include "tree-phinodes.h" #include "stringpool.h" @@ -125,31 +124,7 @@ struct hsa_queue uint64_t id; }; -/* Alloc pools for allocating basic hsa structures such as operands, - instructions and other basic entities. */ -static object_allocator *hsa_allocp_operand_address; -static object_allocator *hsa_allocp_operand_immed; -static object_allocator *hsa_allocp_operand_reg; -static object_allocator *hsa_allocp_operand_code_list; -static object_allocator *hsa_allocp_operand_operand_list; -static object_allocator *hsa_allocp_inst_basic; -static object_allocator *hsa_allocp_inst_phi; -static object_allocator *hsa_allocp_inst_mem; -static object_allocator *hsa_allocp_inst_atomic; -static object_allocator *hsa_allocp_inst_signal; -static object_allocator *hsa_allocp_inst_seg; -static object_allocator *hsa_allocp_inst_cmp; -static object_allocator *hsa_allocp_inst_br; -static object_allocator *hsa_allocp_inst_sbr; -static object_allocator *hsa_allocp_inst_call; -static object_allocator *hsa_allocp_inst_arg_block; -static object_allocator *hsa_allocp_inst_comment; -static object_allocator *hsa_allocp_inst_queue; -static object_allocator *hsa_allocp_inst_srctype; -static object_allocator *hsa_allocp_inst_packed; -static object_allocator *hsa_allocp_inst_cvt; -static object_allocator *hsa_allocp_inst_alloca; -static object_allocator *hsa_allocp_bb; +static struct obstack hsa_obstack; /* List of pointers to all instructions that come from an object allocator. */ static vec hsa_instructions; @@ -467,52 +442,7 @@ static void hsa_init_data_for_cfun () { hsa_init_compilation_unit_data (); -
Re: [PATCH] Retry to emit global variables in HSA (PR hsa/70234)
Hi, On Tue, Mar 15, 2016 at 12:59:03PM +0100, Martin Liska wrote: > Hi. > > As emission of a HSAIL function can fail for various reason (-Whsa), > we must guarantee that a global variable is declared and at maximum once. > > Following patch does that, patch can survive make check-target-libgomp and > HSAILAsm is happy with BRIG output of declare_target-5.c source file. > > Currently, I'm running bootstrap on x86_64-linux-gnu. > Ready to install after if finishes? > > Thanks, > Martin > > gcc/ChangeLog: > > 2016-03-15 Martin Liska > > PR hsa/70234 > * hsa-brig.c (emit_function_directives): Mark unemitted > global variables for emission. > * hsa-gen.c (hsa_symbol::hsa_symbol): Initialize a new flag. > (get_symbol_for_decl): Likewise. > * hsa.h (struct hsa_symbol): New flag. > --- > gcc/hsa-brig.c | 2 ++ > gcc/hsa-gen.c | 22 +++--- > gcc/hsa.h | 3 +++ > 3 files changed, 24 insertions(+), 3 deletions(-) > > diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c > index 2a301be..9b6c0b8 100644 > --- a/gcc/hsa-brig.c > +++ b/gcc/hsa-brig.c > @@ -643,6 +643,8 @@ emit_function_directives (hsa_function_representation *f, > bool is_declaration) >if (!f->m_declaration_p) > for (int i = 0; f->m_global_symbols.iterate (i, &sym); i++) >{ > + gcc_assert (!sym->m_emitted_to_brig); > + sym->m_emitted_to_brig = true; > emit_directive_variable (sym); > brig_insn_count++; >} > diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c > index 5939a57..473d4bd 100644 > --- a/gcc/hsa-gen.c > +++ b/gcc/hsa-gen.c > @@ -162,7 +162,7 @@ hsa_symbol::hsa_symbol () > m_directive_offset (0), m_type (BRIG_TYPE_NONE), > m_segment (BRIG_SEGMENT_NONE), m_linkage (BRIG_LINKAGE_NONE), m_dim (0), > m_cst_value (NULL), m_global_scope_p (false), m_seen_error (false), > -m_allocation (BRIG_ALLOCATION_AUTOMATIC) > +m_allocation (BRIG_ALLOCATION_AUTOMATIC), m_emitted_to_brig (false) > { > } > > @@ -174,7 +174,7 @@ hsa_symbol::hsa_symbol (BrigType16_t type, BrigSegment8_t > segment, > m_directive_offset (0), m_type (type), m_segment (segment), > m_linkage (linkage), m_dim (0), m_cst_value (NULL), > m_global_scope_p (global_scope_p), m_seen_error (false), > -m_allocation (allocation) > +m_allocation (allocation), m_emitted_to_brig (false) > { > } > > @@ -880,11 +880,27 @@ get_symbol_for_decl (tree decl) >gcc_checking_assert (slot); >if (*slot) > { > + hsa_symbol *sym = (*slot); > + >/* If the symbol is problematic, mark current function also as >problematic. */ > - if ((*slot)->m_seen_error) > + if (sym->m_seen_error) > hsa_fail_cfun (); > > + /* PR hsa/70234: If a global variable was marked to be emitted, > + but HSAIL generation of a function using the variable fails, > + we should retry to emit the variable in context of a different > + function. > + > + Iterate elements whether a symbol is already in m_global_symbols > + of not. */ > + for (unsigned i = 0; i < hsa_cfun->m_global_symbols.length (); i++) > + if (hsa_cfun->m_global_symbols[i] == sym) > + return *slot; > + > + if (is_in_global_vars && !sym->m_emitted_to_brig) > + hsa_cfun->m_global_symbols.safe_push (sym); > + Hopefully the linear search in m_global_symbols never becomes prohibitively expensive. But it is only necessary when is_in_global_vars is true, so at least we could do something like: if (is_in_global_vars && !sym->m_emitted_to_brig) { for (unsigned i = 0; i < hsa_cfun->m_global_symbols.length (); i++) if (hsa_cfun->m_global_symbols[i] == sym) return *slot; hsa_cfun->m_global_symbols.safe_push (sym); } OK with that change. And even though I have seen the bug only on the hsa branch, commit the fix to trunk too, I think it can happen there as well. Thanks a lot, Martin
Re: [HSA, PATCH] Enhance dump output
Hi, On Mon, Mar 21, 2016 at 12:14:19PM +0100, Martin Liska wrote: > Hello. > > Following patch enhances dump output for SBR instructions and > provides a BRIG offset of HSA symbols. The change does not touch any > code generation snippet and I hope it can be installed during the stage4? yes, but... > > Patch can bootstrap on x86_64-linux-gnu and survives > make check-target-libgomp. > > Ready for trunk? > Thanks, > Martin > From f59542322d584a1c61bfbd0148c90671a89d0593 Mon Sep 17 00:00:00 2001 > From: marxin > Date: Tue, 15 Mar 2016 11:57:30 +0100 > Subject: [PATCH] HSA: enhance dump output > > gcc/ChangeLog: > > 2016-03-15 Martin Liska > > * hsa-dump.c (dump_hsa_insn_1): dump default branch of SBR > insns. > (dump_hsa_symbol): Dump BRIG offset of hsa_symbols. > --- > gcc/hsa-dump.c | 7 ++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/gcc/hsa-dump.c b/gcc/hsa-dump.c > index c5f1f69..ad0c8bf 100644 > --- a/gcc/hsa-dump.c > +++ b/gcc/hsa-dump.c > @@ -721,6 +721,10 @@ dump_hsa_symbol (FILE *f, hsa_symbol *symbol) > >if (symbol->m_type & BRIG_TYPE_ARRAY_MASK) > fprintf (f, "[%lu]", (unsigned long) symbol->m_dim); > + > + ...please remove the added newlines here... > + if (symbol->m_directive_offset) > +fprintf (f, " /* BRIG offset: %u", > symbol->m_directive_offset); ...and I think you are missing an ending "*/" in the string you dump. > } > > /* Dump textual representation of HSA IL operand OP to file F. */ > @@ -929,7 +933,8 @@ dump_hsa_insn_1 (FILE *f, hsa_insn_basic *insn, int > *indent) > fprintf (f, ", "); > } > > - fprintf (f, "]"); > + fprintf (f, "] /* default: BB %i */", > +hsa_bb_for_bb (sbr->m_default_bb)->m_index); I think I've approved this already? Thanks, Martin
Re: [HSA, PATCH] Allocate memory for shadow arg (PR hsa/70337)
Hi, On Mon, Mar 21, 2016 at 01:49:25PM +0100, Martin Liska wrote: > Hello. > > Following patch fixes an invalid write in HSA plug-in. > I've been running bootstrap and regression tests on x86-linux-gnu. > > Ready after it finishes? > Thanks, > Martin > From 2674ceb5fddeaeb26ff87d26a43bddaf40060ea2 Mon Sep 17 00:00:00 2001 > From: marxin > Date: Mon, 21 Mar 2016 13:34:04 +0100 > Subject: [PATCH] Allocate memory for shadow arg (PR hsa/70337) > > libgomp/ChangeLog: > > 2016-03-21 Martin Liska > > PR hsa/70337 > * plugin/plugin-hsa.c (create_single_kernel_dispatch): Allocate > memory for hsa_kernel_runtime * argument. > --- > libgomp/plugin/plugin-hsa.c | 7 --- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/libgomp/plugin/plugin-hsa.c b/libgomp/plugin/plugin-hsa.c > index d888493..36b3cf4 100644 > --- a/libgomp/plugin/plugin-hsa.c > +++ b/libgomp/plugin/plugin-hsa.c > @@ -884,9 +884,10 @@ create_single_kernel_dispatch (struct kernel_info > *kernel, >shadow->private_segment_size = kernel->private_segment_size; >shadow->group_segment_size = kernel->group_segment_size; > > - status > -= hsa_memory_allocate (agent->kernarg_region, > kernel->kernarg_segment_size, > -&shadow->kernarg_address); > + size_t kernarg_size = kernel->kernarg_segment_size > ++ sizeof (struct hsa_kernel_runtime *); This is strange. The pointer to the shadow data structure is, from the HSA perspective, a normal kernel argument and therefore should already be included in the kernel->kernarg_segment_size. Have you checked that the values are indeed off? Martin > + status = hsa_memory_allocate (agent->kernarg_region, kernarg_size, > + &shadow->kernarg_address); >if (status != HSA_STATUS_SUCCESS) > hsa_fatal ("Could not allocate memory for HSA kernel arguments", status); > > -- > 2.7.1 >
Re: [HSA, PATCH] Allocate memory for shadow arg (PR hsa/70337)
On Mon, Mar 21, 2016 at 09:51:27PM +0100, Martin Liska wrote: > On 03/21/2016 07:23 PM, Martin Jambor wrote: > >This is strange. The pointer to the shadow data structure is, from > >the HSA perspective, a normal kernel argument and therefore should > >already be included in the kernel->kernarg_segment_size. Have you > >checked that the values are indeed off? > > Hi Martin. > > You are right that size of a shadow argument pointer should be > included in the kernel->kernarg_segment_size. I've been currently > testing a proper patch which conditionally copies shadow argument. > > Thanks, > Martin > > From 413707c51bf4b0ac7f8dac6421be9955c18767dd Mon Sep 17 00:00:00 2001 > From: marxin > Date: Mon, 21 Mar 2016 21:40:03 +0100 > Subject: [PATCH] Copy shadow argument conditionally (PR hsa/70337) > > libgomp/ChangeLog: > > 2016-03-21 Martin Liska > > PR hsa/70337 > * plugin/plugin-hsa.c (GOMP_OFFLOAD_run): Copy shadow > argument just in case a dispatched kernel uses that argument. This is OK, thanks, Martin > --- > libgomp/plugin/plugin-hsa.c | 12 ++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/libgomp/plugin/plugin-hsa.c b/libgomp/plugin/plugin-hsa.c > index d888493..f7ef600 100644 > --- a/libgomp/plugin/plugin-hsa.c > +++ b/libgomp/plugin/plugin-hsa.c > @@ -1255,8 +1255,16 @@ GOMP_OFFLOAD_run (int n, void *fn_ptr, void *vars, > void **args) >hsa_signal_store_relaxed (s, 1); >memcpy (shadow->kernarg_address, &vars, sizeof (vars)); > > - memcpy (shadow->kernarg_address + sizeof (vars), &shadow, > - sizeof (struct hsa_kernel_runtime *)); > + /* PR hsa/70337. */ > + size_t vars_size = sizeof (vars); > + if (kernel->kernarg_segment_size > vars_size) > +{ > + if (kernel->kernarg_segment_size != vars_size > + + sizeof (struct hsa_kernel_runtime *)) > + GOMP_PLUGIN_fatal ("Kernel segment size has an unexpected value"); > + memcpy (packet->kernarg_address + vars_size, &shadow, > + sizeof (struct hsa_kernel_runtime *)); > +} > >HSA_DEBUG ("Copying kernel runtime pointer to kernarg_address\n"); > > -- > 2.7.1 >
Re: [PATCH] Properly assign to packet header (PR hsa/70394)
Hi, On Thu, Mar 24, 2016 at 12:48:34PM +0100, Martin Liska wrote: > Hello. > > Following patch initializes whole packet->header field, which is eventually > stored > to a packet in atomic manner. The function mechanism was adopted from the HSA > runtime > manual. > > I've been running bootstrap and regression tests. > Ready to be installed after it finishes? > > Thanks, > Martin > > libgomp/ChangeLog: > > 2016-03-24 Martin Liska > > * plugin/plugin-hsa.c (packet_store_release): New function > that is taken from the HSA runtime manual. > (GOMP_OFFLOAD_run): Use the function. OK, thanks, Martin
Re: [PATCH 1/4, libgomp] Resolve deadlock on plugin exit
Hi, On Mon, Mar 21, 2016 at 06:21:02PM +0800, Chung-Lin Tang wrote: > Hi, this is the set of patches from > https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01411.html > revised again, this time also with audits for the HSA plugin. > > The changes are pretty minor, mainly that the unload_image hook now > receives similar error handling treatment. > > Tested again without regressions for nvptx and intelmic, however > while I was able to build the toolchain with HSA offloading support, I was > unsure how I could test it, as I currently don't have any AMD hardware (not > aware if there's an emulator like intelmic). I would be grateful if > the HSA folks can run them for me. I have just tested the whole patch-set on my HSA box (i.e. gomp.exp tests and all libgomp tests on trunk + some extra testing on the hsa branch) and found no issues. I have had a very superficial look over the patch and have no objections but since I am not familiar with the issue this addresses and because I do not have detailed understanding of the of internals of copying data to/from devices, my opinion should not really count much. Nevertheless, thanks for thinking about HSA and making me aware of the change, Martin > > Thanks, > Chung-Lin > > ChangeLog for the libgomp proper parts, patch as attached. > > 2016-03-20 Chung-Lin Tang > > * target.c (gomp_device_copy): New function. > (gomp_copy_host2dev): Likewise. > (gomp_copy_dev2host): Likewise. > (gomp_free_device_memory): Likewise. > (gomp_map_vars_existing): Adjust to call gomp_copy_host2dev(). > (gomp_map_pointer): Likewise. > (gomp_map_vars): Adjust to call gomp_copy_host2dev(), handle > NULL value from alloc_func plugin hook. > (gomp_unmap_tgt): Adjust to call gomp_free_device_memory(). > (gomp_copy_from_async): Adjust to call gomp_copy_dev2host(). > (gomp_unmap_vars): Likewise. > (gomp_update): Adjust to call gomp_copy_dev2host() and > gomp_copy_host2dev() functions. > (gomp_unload_image_from_device): Handle false value from > unload_image_func plugin hook. > (gomp_init_device): Handle false value from init_device_func > plugin hook. > (gomp_exit_data): Adjust to call gomp_copy_dev2host(). > (omp_target_free): Adjust to call gomp_free_device_memory(). > (omp_target_memcpy): Handle return values from host2dev_func, > dev2host_func, and dev2dev_func plugin hooks. > (omp_target_memcpy_rect_worker): Likewise. > (gomp_target_fini): Handle false value from fini_device_func > plugin hook. > * libgomp.h (struct gomp_device_descr): Adjust return type of > init_device_func, fini_device_func, unload_image_func, free_func, > dev2host_func,host2dev_func, and dev2dev_func plugin hooks to 'bool'. > * oacc-host.c (host_init_device): Change return type to bool. > (host_fini_device): Likewise. > (host_unload_image): Likewise. > (host_free): Likewise. > (host_dev2host): Likewise. > (host_host2dev): Likewise. > * oacc-mem.c (acc_free): Handle plugin hook fatal error case. > (acc_memcpy_to_device): Likewise. > (acc_memcpy_from_device): Likewise. > (delete_copyout): Add libfnname parameter, handle free_func > hook fatal error case. > (acc_delete): Adjust delete_copyout call. > (acc_copyout): Likewise. > (update_dev_host): Move gomp_mutex_unlock to after > host2dev/dev2host hook calls. >
Re: [PATCH 3/4, libgomp] Resolve deadlock on plugin exit, HSA plugin parts
Hi, On Mon, Mar 21, 2016 at 06:22:17PM +0800, Chung-Lin Tang wrote: > Hi Martin, I think you're the one to CC for this, > as I mentioned in the first email, this has been build tested, however I did > not know if I could test this without a Radeon card. If convenient, > could you or anyone familiar with the setup do a make check-target-libgomp > with this patch series? > > Thanks, > Chung-Lin > > > * plugin/plugin-hsa.c (hsa_warn): Adjust 'hsa_error' local variable > to 'hsa_error_msg', for clarity. > (hsa_fatal): Likewise. > (hsa_error): New function. > (init_hsa_context): Change return type to bool, adjust to return > false on error. > (queue_callback): Adjust to call hsa_error. > (GOMP_OFFLOAD_get_num_devices): Adjust to handle init_hsa_context > return value. > (GOMP_OFFLOAD_init_device): Change return type to bool, adjust to > return false on error. > (get_agent_info): Adjust to return NULL on error. > (destroy_hsa_program): Change return type to bool, adjust to > return false on error. > (GOMP_OFFLOAD_load_image): Adjust to return -1 on error. > (destroy_module): Change return type to bool, adjust to > return false on error. > (GOMP_OFFLOAD_unload_image): Likewise. > (GOMP_OFFLOAD_fini_device): Likewise. > (GOMP_OFFLOAD_alloc): Change to return NULL when called. > (GOMP_OFFLOAD_free): Change to return false when called. > (GOMP_OFFLOAD_dev2host): Likewise. > (GOMP_OFFLOAD_host2dev): Likewise. > (GOMP_OFFLOAD_dev2dev): Likewise. On the whole, I am fine with the patch but there are two issues: First, and generally, when you change the return type of a function, you must document what return values mean in the comment of the function. Most importantly, it must be immediately apparent whether a function returns true or false on failure from its comment. So please fix that. Second... > Index: libgomp/plugin/plugin-hsa.c > === > --- libgomp/plugin/plugin-hsa.c (revision 234358) > +++ libgomp/plugin/plugin-hsa.c (working copy) > @@ -175,10 +175,10 @@ hsa_warn (const char *str, hsa_status_t status) >if (!debug) > return; > > - const char *hsa_error; > - hsa_status_string (status, &hsa_error); > + const char *hsa_error_msg; > + hsa_status_string (status, &hsa_error_msg); > > - fprintf (stderr, "HSA warning: %s\nRuntime message: %s", str, hsa_error); > + fprintf (stderr, "HSA warning: %s\nRuntime message: %s", str, > hsa_error_msg); > } > > /* Report a fatal error STR together with the HSA error corresponding to > STATUS > @@ -187,12 +187,25 @@ hsa_warn (const char *str, hsa_status_t status) > static void > hsa_fatal (const char *str, hsa_status_t status) > { > - const char *hsa_error; > - hsa_status_string (status, &hsa_error); > + const char *hsa_error_msg; > + hsa_status_string (status, &hsa_error_msg); >GOMP_PLUGIN_fatal ("HSA fatal error: %s\nRuntime message: %s", str, > - hsa_error); > + hsa_error_msg); > } > > +/* Like hsa_fatal, except only report error message, and return FALSE > + for propagating error processing to outside of plugin. */ > + > +static bool > +hsa_error (const char *str, hsa_status_t status) > +{ > + const char *hsa_error_msg; > + hsa_status_string (status, &hsa_error_msg); > + GOMP_PLUGIN_error ("HSA fatal error: %s\nRuntime message: %s", str, > + hsa_error_msg); > + return false; > +} > + > struct hsa_kernel_description > { >const char *name; ... > /* Callback of dispatch queues to report errors. */ > @@ -454,7 +471,7 @@ queue_callback (hsa_status_t status, > hsa_queue_t *queue __attribute__ ((unused)), > void *data __attribute__ ((unused))) > { > - hsa_fatal ("Asynchronous queue error", status); > + hsa_error ("Asynchronous queue error", status); > } ...I believe this hunk is wrong. Errors reported in this way mean that something is very wrong and generally happen during execution of code on HSA GPU, i.e. within GOMP_OFFLOAD_run. And since you left calls in create_single_kernel_dispatch, which is called as a part of GOMP_OFFLOAD_run, intact, I believe you actually want to leave hsa_fatel here too. Thanks, Martin
Re: [PATCH 3/4, libgomp] Resolve deadlock on plugin exit, HSA plugin parts
Hi, On Sun, Mar 27, 2016 at 06:26:29PM +0800, Chung-Lin Tang wrote: > On 2016/3/25 上午 02:40, Martin Jambor wrote: > > On the whole, I am fine with the patch but there are two issues: > > > > First, and generally, when you change the return type of a function, > > you must document what return values mean in the comment of the > > function. Most importantly, it must be immediately apparent whether a > > function returns true or false on failure from its comment. So please > > fix that. > > Thanks, I'll update on that. > > >> > /* Callback of dispatch queues to report errors. */ > >> > @@ -454,7 +471,7 @@ queue_callback (hsa_status_t status, > >> > hsa_queue_t *queue __attribute__ ((unused)), > >> > void *data __attribute__ ((unused))) > >> > { > >> > - hsa_fatal ("Asynchronous queue error", status); > >> > + hsa_error ("Asynchronous queue error", status); > >> > } > > ...I believe this hunk is wrong. Errors reported in this way mean > > that something is very wrong and generally happen during execution of > > code on HSA GPU, i.e. within GOMP_OFFLOAD_run. And since you left > > calls in create_single_kernel_dispatch, which is called as a part of > > GOMP_OFFLOAD_run, intact, I believe you actually want to leave > > hsa_fatel here too. > > Yes, a fatal exit is okay within the 'run' hook, since we're not holding > the device lock there. I was only trying to audit the > GOMP_OFFLOAD_init_device() > function, where the queues are created. > > I'm not familiar with the HSA runtime API; will the callback only be triggered > during GPU kernel execution (inside the 'run' hook), and not for example, > within hsa_queue_create()? If so, then yes as you advised, the above change to > queue_callback() should be reverted. > The documentation says the callback is "invoked by the HSA runtime for every asynchronous event related to the newly created queue." All enumerated situations when the callback is called happen at command launch time (i.e. inside a run hook). Since creation of the queue is a synchronous event, callback should not be invoked if it fails. But of course, the description does not rule out such failures do not occur out of the blue at any arbitrary time. But I think this is as improbable as an GOMP_PLUGIN_malloc ending up in a fatal error, which is something you do not seem to be worried about. So please revert the hunk. Thanks, Martin
Re: [PATCH 1/2] HSA: support alignment for hsa_symbols (PR hsa/70391)
Hi, this is OK with one small adjustments in a comment: On Tue, Mar 22, 2016 at 03:51:53PM +0100, Martin Liska wrote: > gcc/ChangeLog: > > 2016-03-23 Martin Liska > > PR hsa/70391 > * hsa-brig.c (emit_directive_variable): Emit alignment > according to hsa_symbol::m_align. > * hsa-dump.c (hsa_byte_alignment): Move the function to > another file. > (dump_hsa_symbol): Dump alignment of HSA symbols. > * hsa-gen.c (get_symbol_for_decl): Set-up alignment > of a symbol. > (gen_hsa_addr_with_align): New function. > (hsa_bitmemref_alignment): Use newly added function. > (gen_hsa_insns_for_load): Likewise. > (gen_hsa_insns_for_store): Likewise. > (gen_hsa_memory_copy): New argument added. > (gen_hsa_insns_for_single_assignment): Respect > alignment for assignments processed via > gen_hsa_memory_copy. > (gen_hsa_insns_for_direct_call): Likewise. > (gen_hsa_insns_for_return): Likewise. > (gen_function_def_parameters): Set default > alignment. > * hsa.c (hsa_object_alignment): New function. > (hsa_byte_alignment): Pasted function. > * hsa.h (hsa_symbol::m_align): New field. > --- > gcc/hsa-brig.c | 5 +--- > gcc/hsa-dump.c | 13 ++--- > gcc/hsa-gen.c | 88 > +- > gcc/hsa.c | 20 + > gcc/hsa.h | 8 +- > 5 files changed, 99 insertions(+), 35 deletions(-) > > diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c > index 72eecf9..db39813 100644 > --- a/gcc/hsa-gen.c > +++ b/gcc/hsa-gen.c > @@ -169,12 +169,12 @@ hsa_symbol::hsa_symbol () > > hsa_symbol::hsa_symbol (BrigType16_t type, BrigSegment8_t segment, > BrigLinkage8_t linkage, bool global_scope_p, > - BrigAllocation allocation) > + BrigAllocation allocation, BrigAlignment8_t align) >: m_decl (NULL_TREE), m_name (NULL), m_name_number (0), > m_directive_offset (0), m_type (type), m_segment (segment), > m_linkage (linkage), m_dim (0), m_cst_value (NULL), > m_global_scope_p (global_scope_p), m_seen_error (false), > -m_allocation (allocation), m_emitted_to_brig (false) > +m_allocation (allocation), m_emitted_to_brig (false), m_align (align) > { > } > > @@ -908,21 +908,29 @@ get_symbol_for_decl (tree decl) > { >hsa_symbol *sym; >gcc_assert (TREE_CODE (decl) == VAR_DECL); > + BrigAlignment8_t align = hsa_object_alignment (decl); > >if (is_in_global_vars) > { > sym = new hsa_symbol (BRIG_TYPE_NONE, BRIG_SEGMENT_GLOBAL, > BRIG_LINKAGE_PROGRAM, true, > - BRIG_ALLOCATION_PROGRAM); > + BRIG_ALLOCATION_PROGRAM, align); > hsa_cfun->m_global_symbols.safe_push (sym); > } >else > { > + /* As generation of memory copy instructions relies on alignment > + greater or equal to 8 bytes, we need to increase alignment > + of all aggregate types.. */ Let's say "efficient memory copy instructions." It is of curse possible to use slower ones. Thanks, Martin
Re: [PATCH 2/2] HSA: handle alignment of string builtins (PR hsa/70391)
Hi, On Wed, Mar 23, 2016 at 02:43:17PM +0100, Martin Liska wrote: > gcc/ChangeLog: > > 2016-03-23 Martin Liska > > PR hsa/70391 > * hsa-gen.c (hsa_function_representation::update_cfg): New > function. > (convert_addr_to_flat_segment): Likewise. > (gen_hsa_memory_set): New alignment argument. > (gen_hsa_ctor_assignment): Likewise. > (gen_hsa_insns_for_single_assignment): Provide alignment > to gen_hsa_ctor_assignment. > (gen_hsa_insns_for_direct_call): Add new argument. > (expand_lhs_of_string_op): New function. > (expand_string_operation_builtin): Likewise. > (expand_memory_copy): New function. > (expand_memory_set): New function. > (gen_hsa_insns_for_call): Use HOST_WIDE_INT. > (convert_switch_statements): Change signature. > (generate_hsa): Use a return value of the function. > (pass_gen_hsail::execute): Do not call > convert_switch_statements here. > * hsa-regalloc.c (hsa_regalloc): Call update_cfg. > * hsa.h (hsa_function_representation::m_need_cfg_update): > New flag. > (hsa_function_representation::update_cfg): New function. As we already discussed, update_cfg and m_need_cfg_update should really be called differently, because CFG has already been modified and only dominance needs to be re-computed. If you havent't thought about any names yet, what about m_modified_cfg and update_dominance() ? > --- > gcc/hsa-gen.c | 372 > ++--- > gcc/hsa-regalloc.c | 1 + > gcc/hsa.h | 9 +- > 3 files changed, 275 insertions(+), 107 deletions(-) > > diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c > index db39813..db7fc3d 100644 > --- a/gcc/hsa-gen.c > +++ b/gcc/hsa-gen.c > @@ -214,7 +214,7 @@ hsa_symbol::fillup_for_decl (tree decl) > should be set to number of SSA names used in the function. */ > > hsa_function_representation::hsa_function_representation > - (tree fdecl, bool kernel_p, unsigned ssa_names_count) > + (tree fdecl, bool kernel_p, unsigned ssa_names_count, bool need_cfg_update) >: m_name (NULL), > m_reg_count (0), m_input_args (vNULL), > m_output_arg (NULL), m_spill_symbols (vNULL), m_global_symbols (vNULL), > @@ -223,7 +223,8 @@ hsa_function_representation::hsa_function_representation > m_in_ssa (true), m_kern_p (kernel_p), m_declaration_p (false), > m_decl (fdecl), m_internal_fn (NULL), m_shadow_reg (NULL), > m_kernel_dispatch_count (0), m_maximum_omp_data_size (0), > -m_seen_error (false), m_temp_symbol_count (0), m_ssa_map () > +m_seen_error (false), m_temp_symbol_count (0), m_ssa_map (), > +m_need_cfg_update (need_cfg_update) > { >int sym_init_len = (vec_safe_length (cfun->local_decls) / 2) + 1;; >m_local_symbols = new hash_table (sym_init_len); > @@ -319,6 +320,16 @@ hsa_function_representation::init_extra_bbs () >hsa_init_new_bb (EXIT_BLOCK_PTR_FOR_FN (cfun)); > } > > +void > +hsa_function_representation::update_cfg () > +{ > + if (m_need_cfg_update) > +{ > + free_dominance_info (CDI_DOMINATORS); > + calculate_dominance_info (CDI_DOMINATORS); > +} > +} > + > hsa_symbol * > hsa_function_representation::create_hsa_temporary (BrigType16_t type) > { > @@ -2246,30 +2257,14 @@ gen_hsa_addr_for_arg (tree tree_type, int index) >return new hsa_op_address (sym); > } > > -/* Generate HSA instructions that calculate address of VAL including all > - necessary conversions to flat addressing and place the result into DEST. > +/* Generate HSA instructions that process all necessary conversions > + of an ADDR to flat addressing and place the result into DEST. > Instructions are appended to HBB. */ > > static void > -gen_hsa_addr_insns (tree val, hsa_op_reg *dest, hsa_bb *hbb) > +convert_addr_to_flat_segment (hsa_op_address *addr, hsa_op_reg *dest, > + hsa_bb *hbb) > { > - /* Handle cases like tmp = NULL, where we just emit a move instruction > - to a register. */ > - if (TREE_CODE (val) == INTEGER_CST) > -{ > - hsa_op_immed *c = new hsa_op_immed (val); > - hsa_insn_basic *insn = new hsa_insn_basic (2, BRIG_OPCODE_MOV, > - dest->m_type, dest, c); > - hbb->append_insn (insn); > - return; > -} > - > - hsa_op_address *addr; > - > - gcc_assert (dest->m_type == hsa_get_segment_addr_type (BRIG_SEGMENT_FLAT)); > - if (TREE_CODE (val) == ADDR_EXPR) > -val = TREE_OPERAND (val, 0); > - addr = gen_hsa_addr (val, hbb); >hsa_insn_basic *insn = new hsa_insn_basic (2, BRIG_OPCODE_LDA); >insn->set_op (1, addr); >if (addr->m_symbol && addr->m_symbol->m_segment != BRIG_SEGMENT_GLOBAL) > @@ -2298,6 +2293,34 @@ gen_hsa_addr_insns (tree val, hsa_op_reg *dest, hsa_bb > *hbb) > } > } > > +/* Generate HSA instructions that calculate address of VAL including all > + necessary conversions to flat
Re: [PATCH 2/2] Fix PR hsa/70402
Hi, On Thu, Mar 31, 2016 at 12:50:54PM +0200, Martin Liska wrote: > On 03/29/2016 01:44 PM, Martin Liška wrote: > > Second part of the patch set which omits one split_block (compared to the > > original patch). > > Acceptable just in case the first part will be accepted. > > > > Thanks > > Martin > > > > Hi. > > I'm sending v3 of the patch which does not immediately update dominator, > but sets a flag that eventually triggers the update. > The patch is OK after you change the name of the flag (introduced in a different patch) to the new one. Thanks, Martin
Re: Splitting up gcc/omp-low.c?
Hi, On Fri, Apr 08, 2016 at 11:36:03AM +0200, Thomas Schwinge wrote: > Hi! > > On Thu, 10 Dec 2015 09:08:35 +0100, Jakub Jelinek wrote: > > On Wed, Dec 09, 2015 at 06:23:22PM +0100, Bernd Schmidt wrote: > > > On 12/09/2015 05:24 PM, Thomas Schwinge wrote: > > > > > > > >In addition to that, how about we split up gcc/omp-low.c into several > > > >files? Would it make sense (I have not yet looked in detail) to do so > > > >along the borders of the several passes defined therein? Or, can you > > > >tell already that there would be too many cross-references between the > > > >several files to make this infeasible? > > > > > > It would be nice to get rid of all the code duplication in that file. That > > > alone could reduce the size by quite a bit, and hopefully make it easier > > > to > > > read. > > > > What exact code duplication do you mean? > > (Has been discussed in the following.) At this point, I do not intend to > work on any kinds of cleanup, but rather just the "mechanical" changes: > > > > I suspect a split along the ompexp/omplow boundary would be quite easy to > > > achieve. > > > > Yeah, that might be the possible splitting boundary (have omp-low.c, > > omp-exp.c). > > Right. And possibly some kind of omp-simd.c, and omp-checking.c, and so > on, if feasible. (I have not yet looked in detail.) > > > > >I'd suggest to do this shortly before GCC 6 is released, so that > > > >backports from trunk to gcc-6-branch will be easy. (I assume we don't > > > >have to care for gcc-5-branch backports too much any longer.) > > > > > > I'll declare myself agnostic as to whether such a change is appropriate > > > for > > > gcc-6 at this stage. I guess it kind of depends on the specifics. > > > > Certainly. On one side I'd say it is too late now in stage3, on the other > > side when would be better time to do that, during stage1 people will have > > more likely out of the tree branches with more changes (I'm aware we even > > now have the HSA, OpenMP -> PTX and OpenACC branches). > > > > So, if somebody wants to try that, we can see if the result would be > > appropriate. > > So, has time now come to execute this task? (To remind: the idea > explicitly has been to do this late, shortly before the gcc-6-branch gets > created, to make it easy in the following months to apply patches to both > trunk and gcc-6-branch.) > Unless someone is quicler, I can give it a go next Thursday (not any sooner, unfortunately). I would do a division into omp-low.c and omp-exp.c and possibly an omp.c for simple stuff not fitting anywhere else and perhaps even a separate omp-gridify.c. Someone else would have to put stuff into an omp-simd.c, I'm afraid. But it we can go about this incrementaly. Thanks, Martin
[patch] Avoid an unwanted decl re-map in copy_gimple_seq_and_replace_locals
Hi, I ran into an ICE when compiling the following function on the HSA branch: foo (int n, int m, int o, int (*a)[m][o]) { int i, j, k; #pragma omp target teams distribute parallel for shared(a) firstprivate(n, m, o) private(i,j,k) for (i = 0; i < n; i++) for (j = 0; j < m; j++) for (k = 0; k < o; k++) a[i][j][k] = i + j + k; } The problem is that when I duplicate the loop with copy_gimple_seq_and_replace_locals, I get one extra re-mapping. Specifically, I feed the function this: { int i.2; #pragma omp teams shared(a) firstprivate(n) firstprivate(m) firstprivate(o) shared(m.1) shared(D.3275) shared(o.0) { #pragma omp distribute private(i.2) for (i.2 = 0; i.2 < n; i.2 = i.2 + 1) { #pragma omp parallel shared(a) firstprivate(n) firstprivate(m) firstprivate(o) private(i) private(j) private(k) shared(m.1) shared(D.3275) shared(o.0) { sizetype D.3286; long unsigned int D.3287; sizetype D.3288; sizetype D.3289; sizetype D.3290; long unsigned int D.3291; long unsigned int D.3292; int[0:D.3279][0:D.3271] * D.3293; int D.3294; int D.3295; #pragma omp for nowait for (i = 0; i < n; i = i + 1) { j = 0; goto ; : k = 0; goto ; : D.3286 = D.3275 /[ex] 4; <--- here I get wrog decl D.3287 = (long unsigned int) i; D.3288 = (sizetype) o.0; D.3289 = (sizetype) m.1; D.3290 = D.3288 * D.3289; D.3291 = D.3287 * D.3290; D.3292 = D.3291 * 4; D.3293 = a + D.3292; D.3294 = i + j; D.3295 = D.3294 + k; *D.3293[j]{lb: 0 sz: D.3286 * 4}[k] = D.3295; k = k + 1; : if (k < o) goto ; else goto ; : j = j + 1; : if (j < m) goto ; else goto ; : } } } } } and it replaces D.3275 with its new copy with undefined value. The mapping is created when an array type where the size is defined in terms of that variable declaration is copied. The comment in type-remapping code says that we "use the already remaped data" but that is not true. My solution was to prevent declaration duplication in this case with yet another state variable in struct copy_body_data that holds a special value when we are running copy_gimple_seq_and_replace_locals and another when we are within type-remapping. I'll be happy for any suggestion how to deal with this without cluttering copy_body_date even more but so far I have not found any. If nobody has a better idea, is the following good for trunk? (I am about to commit it to the hsa branch.) It has passed bootstrap and testing on x86_64-linux. Thanks, Martin 2016-01-06 Martin Jambor * tree-inline.h (copy_body_data): New field decl_creation_prevention_level. Moved remap_var_for_cilk to minimize padding. * tree-inline.c (remap_decl): Return original decls if decl_creation_prevention_level is two or bigger. (remap_type_1): Increment and decrement decl_creation_prevention_level if appropriate. (copy_gimple_seq_and_replace_locals): Set decl_creation_prevention_level to 1. diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c index 88a6753..2df11a2 100644 --- a/gcc/tree-inline.c +++ b/gcc/tree-inline.c @@ -340,8 +340,20 @@ remap_decl (tree decl, copy_body_data *id) return decl; } - /* If we didn't already have an equivalent for this declaration, - create one now. */ + /* If decl copying is forbidden (which happens when copying a type with size + defined outside of the copied sequence) work with the original decl. */ + if (!n + && id->decl_creation_prevention_level > 1 + && (VAR_P (decl) || TREE_CODE (decl) == PARM_DECL)) +{ + if (id->do_not_unshare) + return decl; + else + return unshare_expr (decl); +} + + /* If we didn't already have an equivalent for this declaration, create one + now. */ if (!n) { /* Make a copy of the variable or label. */ @@ -526,7 +538,10 @@ remap_type_1 (tree type, copy_body_data *id) gcc_unreachable (); } - /* All variants of type share the same size, so use the already remaped data. */ + /* All variants of type share the same size, so use the already remaped + data. */ + if (id->decl_creation_prevention_level > 0) +id->decl_creation_prevention_level++; if (TYPE_MAIN
[PR ipa/66616] Fix artificial thunk ABI issues
Hi, i386 -m32 failure of the PR 66616 testcase was caused by the fact that, on the callee side, the calling conventions of a thunk are decided according to the properties of the function it is associated with, but on the caller side, the actual thunk is examined. Since they depend on the can_change_signature cgraph_node flag and the flag of artificial thunks has not been copied from the function, the caller and callee could disagree on ABI. Fixed thusly, by copying the flag to the artificial thunk. Testcase is already in the testsuite (g++.dg/ipa/pr66616.C). The patch has successfully passed bootstrap and testing on i686-linux, I have also included it in a bootstrap and testing that is underway on x86_64-linux. OK if it passes there as well? Thanks, Martin [PR ipa/66616] Copy can_change_signature flag to artificial thunks 2016-01-07 Martin Jambor PR ipa/66616 * cgraphclones.c (duplicate_thunk_for_node): Copy can_change_signature flag. diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c index f8a7d37..8759ce4 100644 --- a/gcc/cgraphclones.c +++ b/gcc/cgraphclones.c @@ -328,6 +328,7 @@ duplicate_thunk_for_node (cgraph_node *thunk, cgraph_node *node) new_thunk = cgraph_node::create (new_decl); set_new_clone_decl_and_node_flags (new_thunk); new_thunk->definition = true; + new_thunk->local.can_change_signature = node->local.can_change_signature; new_thunk->thunk = thunk->thunk; new_thunk->unique_name = in_lto_p; new_thunk->former_clone_of = thunk->decl;
[PR 69044] Do not clone for parameter removal when !can_change_signature
Hi, we generally do not have ther ability to propagate constants to and clone CHKP instrumented functions. Therefore we do not propagate stuff into their lattices but since Honza changed cloning for all contexts heuristics a few weeks ago, we might attempt to clone them for unused parameter removal, which then leads to an ICE (and all sorts of issues). The heuristics however should not attempt to do that because the function cgraph_node has can_change_signature flag cleared. So this patch changes it accordingly. Bootstrapped and tested on x86_64-linux, OK for trunk? Thanks, Martin 2016-01-08 Martin Jambor PR ipa/69044 * ipa-cp.c (estimate_local_effects): Do not clone for removal of useless parameters if we cannot change function signature. testsuite/ * gcc.target/i386/chkp-pr69044.c: New test. diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c index 782df71..d99e69c 100644 --- a/gcc/ipa-cp.c +++ b/gcc/ipa-cp.c @@ -2518,7 +2518,8 @@ estimate_local_effects (struct cgraph_node *node) known_aggs_ptrs = agg_jmp_p_vec_for_t_vec (known_aggs); int devirt_bonus = devirtualization_time_bonus (node, known_csts, known_contexts, known_aggs_ptrs); - if (always_const || devirt_bonus || removable_params_cost) + if (always_const || devirt_bonus + || (removable_params_cost && node->local.can_change_signature)) { struct caller_statistics stats; inline_hints hints; diff --git a/gcc/testsuite/gcc.target/i386/chkp-pr69044.c b/gcc/testsuite/gcc.target/i386/chkp-pr69044.c new file mode 100644 index 000..933e88a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/chkp-pr69044.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target mpx } */ +/* { dg-options "-fcheck-pointer-bounds -mmpx -O2" } */ + +int i; +int strncasecmp (char *p1, char *p2, long p3) { return 0; } +int special_command () +{ + if (strncasecmp (0, 0, 0)) +i++; +}
Re: [hsa 2/10] Modifications to libgomp proper
Hi, On Fri, Dec 11, 2015 at 07:05:29PM +0100, Jakub Jelinek wrote: > On Thu, Dec 10, 2015 at 06:52:23PM +0100, Martin Jambor wrote: > > > > --- a/libgomp/task.c > > > > +++ b/libgomp/task.c > > > > @@ -581,6 +581,7 @@ GOMP_PLUGIN_target_task_completion (void *data) > > > >gomp_mutex_unlock (&team->task_lock); > > > > } > > > >ttask->state = GOMP_TARGET_TASK_FINISHED; > > > > + free (ttask->firstprivate_copies); > > > >gomp_target_task_completion (team, task); > > > >gomp_mutex_unlock (&team->task_lock); > > > > } > > > > > > So, this function should have a special case for the SHARED_MEM case, > > > handle > > > it closely to say how GOMP_taskgroup_end handles the finish_cancelled: > > > case. Just note that the target task is missing from certain queues at > > > that > > > point. > > > > I'm afraid I need some help here. I do not quite understand how is > > finish_cancelled in GOMP_taskgroup_end similar, it seems to be doing > > much more than freeing one pointer. What is exactly the issue with > > the above? > > > > Nevertheless, after reading through bits of task.c again, I wonder > > whether any copying (for both shared memory target and the host) in > > gomp_target_task_fn is actually necessary because it seems to be also > > done in gomp_create_target_task. Does that not apply somehow? > > The target task is scheduled for the first action as normal task, and the > scheduling of it already removes it from some of the queues (each task is > put into 1-3 queues), i.e. actions performed mostly by > gomp_task_run_pre. Then the team task lock is unlocked and the task is run. > Finally, for normal tasks, gomp_task_run_post_handle_depend, > gomp_task_run_post_remove_parent, etc. is run. Now, for async target tasks > that have something running on some other device at that point, we don't do > that, but instead make it GOMP_TASK_ASYNC_RUNNING. And continue with other > stuff, until gomp_target_task_completion is run. > For non-shared mem that needs to readd the task again into the queues, so > that it will be scheduled again. But you don't need that for shared mem > target tasks, they can just free the firstprivate_copies and finalize the > task. > At the time gomp_target_task_completion is called, the task is pretty much > in the same state as it is around the finish_cancelled:; label. > So instead of what the gomp_target_task_completion function does, > you would for SHARED_MEM do something like: > size_t new_tasks > = gomp_task_run_post_handle_depend (task, team); > gomp_task_run_post_remove_parent (task); > gomp_clear_parent (&task->children_queue); > gomp_task_run_post_remove_taskgroup (task); > team->task_count--; > do_wake = 0; > if (new_tasks > 1) > { > do_wake = team->nthreads - team->task_running_count > - !task->in_tied_task; > if (do_wake > new_tasks) > do_wake = new_tasks; > } > // Unlike other places, the following will be also run with the > // task_lock held, but I'm afraid there is nothing to do about it. > // See the comment in gomp_target_task_completion. > gomp_finish_task (task); > free (task); > if (do_wake) > gomp_team_barrier_wake (&team->barrier, do_wake); > I tried the above but libgomp testcase target-33.c always got stuck within GOMP_taskgroup_end call, more specifically in gomp_team_barrier_wait_end in config/linux/bar.c where the the first call to gomp_barrier_handle_tasks left the barrier->generation as BAR_WAITING_FOR_TASK and then nothing ever happened, even as the callbacks fired. After looking into the tasking mechanism for basically the whole day yesterday, I *think* I fixed it by calling gomp_team_barrier_set_task_pending from the callback and another hunk in gomp_barrier_handle_tasks so that it clears that barrier flag even if it has not picked up any tasks. Please let me know if you think it makes sense. If so, I'll include it in an HSA patch set I hope to generate today. Otherwise I guess I'd prefer to remove the shared-memory path and revert to old behavior as a temporary measure until we find out what was wrong. Thanks and sorry that it took me so long to resolve this, Martin diff --git a/libgomp/task.c b/libgomp/task.c index ab5df51..828c1fb 100644 --- a/libgomp/task.c +++ b/libgomp/task.c @@ -566,6 +566,14 @@ gomp_target_task_completion (struct gomp_team *team, str
Re: [patch] Avoid an unwanted decl re-map in copy_gimple_seq_and_replace_locals
Hi, On Mon, Jan 11, 2016 at 05:38:47PM +0100, Jakub Jelinek wrote: > On Mon, Jan 11, 2016 at 09:41:31AM +0100, Richard Biener wrote: > > Hum. Can't you check id->remapping_type_depth? For some reason, last week I reached the conclusion that no. But I must have done something wrong because I have tested it again today and just never creating a new decl in remap_decl if id->remapping_type_depth is non zero is good enough for my testcase and it survives bootstrap and testing too (previously I thought it did not). id->remapping_type_depth seems to be incremented for DECL_VALUE_EXPR id->as well, so it actually might help in that situation too. > That said, how do > > we end up recursing into remap_decl when copying the variable length > > decl/type? Can't we avoid the recursion (basically avoid remapping > > variable-size types at all?) Here I agree with Jakub that there are situations where we have to. There is a comment towards the end of remap_type_1 saying that when remapping types, all required decls should have already been mapped. If that is correct, and I belive it is, the remapping_type_depth test should be fine. > > I guess it depends, VLA types that refer in their various gimplified > expressions only to decls defined outside of bind stmts we are duplicating > are fine as is, they don't need remapping, or could be remapped to VLA types > that use all the same temporary decls. > VLAs that have some or all references to decls inside of the bind stmts > we are duplicating IMHO need to be remapped. > So, perhaps we need to remap_decls in replace_locals_stmt in two phases > in presence of VLAs (or also vars with DECL_VALUE_EXPR) I'm a bit worried what would happen do local DECLs that are pointers to VLAs, because... > - phase 1 would just walk the > for (old_var = decls; old_var; old_var = DECL_CHAIN (old_var)) > { > if (!can_be_nonlocal (old_var, id) > && ! variably_modified_type_p (TREE_TYPE (old_var), id->src_fn)) ...variably_modified_type_p seems to return true for them and... > remap_decl (old_var, id); > } > - phase 2 - do the full remap_decls, but during that arrange that > remap_decl for non-zero id->remapping_type_depth if (!n) just returns > decl ...they would not be copied here because remap_decl would not be duplicating stuff. So I'd end up with an original local decl when I actually need a duplicate. But let me go with just checking the remapping_type_depth for now. Thanks for looking into this, Martin > That way, I think if the types refer to some temporaries that are defined > in the bind stmts being copied, they will be properly duplicated, otherwise > they will be shared. > So, we'd need some flag in *id (just bool bitfield would be enough) that would > allow replace_locals_stmt to set it before the remap_decls call in phase 2 > and clear it afterwards, and use that flag together with > id->remapping_type_depth in remap_decls. > > Jakub
Re: [patch] Avoid an unwanted decl re-map in copy_gimple_seq_and_replace_locals
On Tue, Jan 12, 2016 at 06:36:21PM +0100, Martin Jambor wrote: > > remap_decl (old_var, id); > > } > > - phase 2 - do the full remap_decls, but during that arrange that > > remap_decl for non-zero id->remapping_type_depth if (!n) just returns > > decl > > ...they would not be copied here because remap_decl would not be > duplicating stuff. So I'd end up with an original local decl when I > actually need a duplicate. > ugh, I'm trying to be too fast and obviously forgot about the id->remapping_type_depth part of the proposed condition. Still, when could relying solely on id->remapping_type_depth fail? Sorry for the noise, Martin
Re: [hsa 2/10] Modifications to libgomp proper
Hi, On Tue, Jan 12, 2016 at 02:38:15PM +0100, Jakub Jelinek wrote: > On Tue, Jan 12, 2016 at 02:29:06PM +0100, Martin Jambor wrote: > > GOMP_kernel_launch_attributes should not be there (it is a > > reminiscence from before the device-specific target arguments) and > > should be moved just to the HSA plugin. I'll prepare a patch today. > > > > While we do not have to share GOMP_hsa_kernel_dispatch, we actually do > > use them in both the plugin and the compiler, where we only use it in > > an offsetof, so that we only have the structure defined once. > > But, even using it in offsetof might be wrong, the compiler could be a > cross-compiler, and you'd use offsetof on the host, while you want it for > the target, and that would be different. > So, IMHO you need (unless you already have) built the structure as a tree > type, lay it out, and then you can use at TYPE_SIZE_UNIT or > DECL_FIELD_OFFSET and the like. > I see. For now I have just put a FIXME there but have talked to Martin about laying out the type properly. This is what I have committed to the branch. Thanks, Martin 2016-01-12 Martin Jambor include/ * gomp-constants.h (GOMP_kernel_launch_attributes): Removed. (GOMP_hsa_kernel_dispatch): Likewise. libgomp/ * plugin/plugin-hsa.c (GOMP_kernel_launch_attributes): Moved here. (GOMP_hsa_kernel_dispatch): Likewise. gcc/ * hsa-gen.c (GOMP_hsa_kernel_dispatch): Moved here. --- gcc/hsa-gen.c | 35 + include/gomp-constants.h| 44 -- libgomp/plugin/plugin-hsa.c | 47 + 3 files changed, 82 insertions(+), 44 deletions(-) diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c index 1715b57..f633dfd 100644 --- a/gcc/hsa-gen.c +++ b/gcc/hsa-gen.c @@ -3747,6 +3747,41 @@ gen_set_num_threads (tree value, hsa_bb *hbb) hbb->append_insn (basic); } +/* Collection of information needed for a dispatch of a kernel from a + kernel. Keep in sync with libgomp's plugin-hsa.c. + + FIXME: In order to support cross-compilations, we need to lay ot the type as + a tree and then use field_decl positions. + */ + +struct GOMP_hsa_kernel_dispatch +{ + /* Pointer to a command queue associated with a kernel dispatch agent. */ + void *queue; + /* Pointer to reserved memory for OMP data struct copying. */ + void *omp_data_memory; + /* Pointer to a memory space used for kernel arguments passing. */ + void *kernarg_address; + /* Kernel object. */ + uint64_t object; + /* Synchronization signal used for dispatch synchronization. */ + uint64_t signal; + /* Private segment size. */ + uint32_t private_segment_size; + /* Group segment size. */ + uint32_t group_segment_size; + /* Number of children kernel dispatches. */ + uint64_t kernel_dispatch_count; + /* Number of threads. */ + uint32_t omp_num_threads; + /* Debug purpose argument. */ + uint64_t debug; + /* Levels-var ICV. */ + uint64_t omp_level; + /* Kernel dispatch structures created for children kernel dispatches. */ + struct GOMP_hsa_kernel_dispatch **children_dispatches; +}; + /* Return an HSA register that will contain number of threads for a future dispatched kernel. Instructions are added to HBB. */ diff --git a/include/gomp-constants.h b/include/gomp-constants.h index 1dae474..a8e7723 100644 --- a/include/gomp-constants.h +++ b/include/gomp-constants.h @@ -256,48 +256,4 @@ enum gomp_map_kind /* Identifiers of device-specific target arguments. */ #define GOMP_TARGET_ARG_HSA_KERNEL_ATTRIBUTES (1 << 8) -/* Structure describing the run-time and grid properties of an HSA kernel - lauch. */ - -struct GOMP_kernel_launch_attributes -{ - /* Number of dimensions the workload has. Maximum number is 3. */ - uint32_t ndim; - /* Size of the grid in the three respective dimensions. */ - uint32_t gdims[3]; - /* Size of work-groups in the respective dimensions. */ - uint32_t wdims[3]; -}; - -/* Collection of information needed for a dispatch of a kernel from a - kernel. */ - -struct GOMP_hsa_kernel_dispatch -{ - /* Pointer to a command queue associated with a kernel dispatch agent. */ - void *queue; - /* Pointer to reserved memory for OMP data struct copying. */ - void *omp_data_memory; - /* Pointer to a memory space used for kernel arguments passing. */ - void *kernarg_address; - /* Kernel object. */ - uint64_t object; - /* Synchronization signal used for dispatch synchronization. */ - uint64_t signal; - /* Private segment size. */ - uint32_t private_segment_size; - /* Group segment size. */ - uint32_t group_segment_size; - /* Number of children kernel dispatches. */ - uint64_t kernel_dispatch_count; - /* Number of threads. */ - uint32_t omp_num_threads; - /* Debug purpose argument. */ - uint64_t debug; - /* Levels-var
[hsa merge 01/10] Configury changes and new options
Hi, this patch contains changes to the configuration mechanism and offload bits, so that users can build compilers with HSA support. It is a re-post of https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00714.html, which, has already been approved by Jakub after a few changes (https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01284.html). thanks, Martin 2016-01-13 Martin Jambor * Makefile.in (OBJS): Add new source files. (GTFILES): Add hsa.c. * common.opt (disable_hsa): New variable. (-Whsa): New warning. * config.in (ENABLE_HSA): New. * configure.ac: Treat hsa differently from other accelerators. (OFFLOAD_TARGETS): Define ENABLE_OFFLOADING according to $enable_offloading. (ENABLE_HSA): Define ENABLE_HSA according to $enable_hsa. * doc/install.texi (Configuration): Document --with-hsa-runtime, --with-hsa-runtime-include, --with-hsa-runtime-lib and --with-hsa-kmt-lib. * doc/invoke.texi (-Whsa): Document. (hsa-gen-debug-stores): Likewise. * lto-wrapper.c (compile_images_for_offload_targets): Do not attempt to invoke offload compiler for hsa acclerator. * opts.c (common_handle_option): Determine whether HSA offloading should be performed. * params.def (PARAM_HSA_GEN_DEBUG_STORES): New parameter. libgomp/plugin/ * Makefrag.am: Add HSA plugin requirements. * configfrag.ac (HSA_RUNTIME_INCLUDE): New variable. (HSA_RUNTIME_LIB): Likewise. (HSA_RUNTIME_CPPFLAGS): Likewise. (HSA_RUNTIME_INCLUDE): New substitution. (HSA_RUNTIME_LIB): Likewise. (HSA_RUNTIME_LDFLAGS): Likewise. (hsa-runtime): New configure option. (hsa-runtime-include): Likewise. (hsa-runtime-lib): Likewise. (PLUGIN_HSA): New substitution variable. Fill HSA_RUNTIME_INCLUDE and HSA_RUNTIME_LIB according to the new configure options. (PLUGIN_HSA_CPPFLAGS): Likewise. (PLUGIN_HSA_LDFLAGS): Likewise. (PLUGIN_HSA_LIBS): Likewise. Check that we have access to HSA run-time. diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 44a18eb..ab9cbbf 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1297,6 +1297,11 @@ OBJS = \ graphite-sese-to-poly.o \ gtype-desc.o \ haifa-sched.o \ + hsa.o \ + hsa-gen.o \ + hsa-regalloc.o \ + hsa-brig.o \ + hsa-dump.o \ hw-doloop.o \ hwint.o \ ifcvt.o \ @@ -1321,6 +1326,7 @@ OBJS = \ ipa-icf.o \ ipa-icf-gimple.o \ ipa-reference.o \ + ipa-hsa.o \ ipa-ref.o \ ipa-utils.o \ ipa.o \ @@ -2404,6 +2410,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h $(srcdir)/coretypes.h \ $(srcdir)/sancov.c \ $(srcdir)/ipa-devirt.c \ $(srcdir)/internal-fn.h \ + $(srcdir)/hsa.c \ @all_gtfiles@ # Compute the list of GT header files from the corresponding C sources, diff --git a/gcc/common.opt b/gcc/common.opt index 49d347c..23e6ed7 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -239,6 +239,10 @@ Inserts call to __sanitizer_cov_trace_pc into every basic block. Variable bool dump_base_name_prefixed = false +; Flag whether HSA generation has been explicitely disabled +Variable +bool flag_disable_hsa = false + ### Driver @@ -593,6 +597,10 @@ Wfree-nonheap-object Common Var(warn_free_nonheap_object) Init(1) Warning Warn when attempting to free a non-heap object. +Whsa +Common Var(warn_hsa) Init(1) Warning +Warn when a function cannot be expanded to HSAIL. + Winline Common Var(warn_inline) Warning Warn when an inlined function cannot be inlined. diff --git a/gcc/config.in b/gcc/config.in index c00cd0f..c3340bb0 100644 --- a/gcc/config.in +++ b/gcc/config.in @@ -144,6 +144,12 @@ #endif +/* Define this to enable support for generating HSAIL. */ +#ifndef USED_FOR_TARGET +#undef ENABLE_HSA +#endif + + /* Define if gcc should always pass --build-id to linker. */ #ifndef USED_FOR_TARGET #undef ENABLE_LD_BUILDID diff --git a/gcc/configure.ac b/gcc/configure.ac index 0a626e9..8d3a869 100644 --- a/gcc/configure.ac +++ b/gcc/configure.ac @@ -940,6 +940,13 @@ AC_SUBST(accel_dir_suffix) for tgt in `echo $enable_offload_targets | sed 's/,/ /g'`; do tgt=`echo $tgt | sed 's/=.*//'` + + if echo "$tgt" | grep "^hsa" > /dev/null ; then +enable_hsa=1 + else +enable_offloading=1 + fi + if test x"$offload_targets" = x; then offload_targets=$tgt else @@ -948,7 +955,7 @@ for tgt in `echo $enable_offload_targets | sed 's/,/ /g'`; do done AC_DEFINE_UNQUOTED(OFFLOAD_TARGETS, "$offload_targets", [Define to offload targets, separated by commas.]) -if test x"$offload_targets" != x; then +if test x"$enable_offloading" != x; then AC_DEFINE(ENABLE_OFFLOADING, 1, [Define this to enable supp
[hsa merge 06/10] Pass manager changes
Hi, the pass manager changes required for HSA have already been committed to trunk so all that remains are these additions to the pass pipeline. This bit has already been approved by Richi in https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00996.html Thanks, Martin 2016-01-13 Martin Jambor Martin Liska * passes.def: Schedule pass_ipa_hsa and pass_gen_hsail. * tree-pass.h (make_pass_gen_hsail): Declare. (make_pass_ipa_hsa): Likewise. diff --git a/gcc/passes.def b/gcc/passes.def index c593851..a6a1719 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -151,6 +151,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_ipa_cp); NEXT_PASS (pass_ipa_cdtor_merge); NEXT_PASS (pass_target_clone); + NEXT_PASS (pass_ipa_hsa); NEXT_PASS (pass_ipa_inline); NEXT_PASS (pass_ipa_pure_const); NEXT_PASS (pass_ipa_reference); @@ -388,6 +389,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_nrv); NEXT_PASS (pass_cleanup_cfg_post_optimizing); NEXT_PASS (pass_warn_function_noreturn); + NEXT_PASS (pass_gen_hsail); NEXT_PASS (pass_expand); diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index e8e8e48..b942a01 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -471,6 +471,7 @@ extern gimple_opt_pass *make_pass_sanopt (gcc::context *ctxt); extern gimple_opt_pass *make_pass_oacc_kernels (gcc::context *ctxt); extern simple_ipa_opt_pass *make_pass_ipa_oacc (gcc::context *ctxt); extern simple_ipa_opt_pass *make_pass_ipa_oacc_kernels (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_gen_hsail (gcc::context *ctxt); /* IPA Passes */ extern simple_ipa_opt_pass *make_pass_ipa_lower_emutls (gcc::context *ctxt); @@ -495,6 +496,7 @@ extern ipa_opt_pass_d *make_pass_ipa_cp (gcc::context *ctxt); extern ipa_opt_pass_d *make_pass_ipa_icf (gcc::context *ctxt); extern ipa_opt_pass_d *make_pass_ipa_devirt (gcc::context *ctxt); extern ipa_opt_pass_d *make_pass_ipa_reference (gcc::context *ctxt); +extern ipa_opt_pass_d *make_pass_ipa_hsa (gcc::context *ctxt); extern ipa_opt_pass_d *make_pass_ipa_pure_const (gcc::context *ctxt); extern simple_ipa_opt_pass *make_pass_ipa_pta (gcc::context *ctxt); extern simple_ipa_opt_pass *make_pass_ipa_tm (gcc::context *ctxt);
[hsa merge 00/10] Merge of HSA branch
Hi, this is hopefully the last big re-post of the HSA patches. We have incorporated all the feedback and found and fixed a couple more bugs. The complete patch-set bootstraps and tests fine on an x86_64-linux, when you do not enable HSA, there are a few expected warnings when HSA is enabled which I will address as a followup together with more testsuite changes. The patches were specifically designed so that the impact on pople not enabling HSA should be minimal. A last round of complete testing on an actual HSA-capable APU is still underway and I won't have the results until tomorrow but preliminary results were good and I di dnot want to hold up these patches for any longer. The libgomp, omp and configuration bits have been reviewed by Jakub, a few other bits by Richi, but still Honza should review the IPA parts and I suppose someone other than me should ack the hsa-* files, even though I probably now have the authority to do it myself. Thanks everybody for patience and feedback. While we are of course opened for mor more of it, let's also hope the approval process will finish soon as it should now. Martin
[hsa merge 02/10] Modifications to libgomp proper
Hi, The patch below contains all changes to libgomp files except for the hsa plugin (which is in the following patch). The patch is a re-post of https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01288.html but we have incorporated a number of requests from the feedback. From the subsequent communications with Jakub, I have the feeling he is fine with the changes. But perhaps he or someone else would like to have one more look. Thanks, Martin 2016-01-13 Martin Jambor include/ * gomp-constants.h (GOMP_DEVICE_HSA): New macro. (GOMP_VERSION_HSA): Likewise. (GOMP_TARGET_ARG_DEVICE_MASK): Likewise. (GOMP_TARGET_ARG_DEVICE_ALL): Likewise. (GOMP_TARGET_ARG_SUBSEQUENT_PARAM): Likewise. (GOMP_TARGET_ARG_ID_MASK): Likewise. (GOMP_TARGET_ARG_NUM_TEAMS): Likewise. (GOMP_TARGET_ARG_THREAD_LIMIT): Likewise. (GOMP_TARGET_ARG_VALUE_SHIFT): Likewise. (GOMP_TARGET_ARG_HSA_KERNEL_ATTRIBUTES): Likewise. libgomp/ * libgomp-plugin.h (offload_target_type): New element OFFLOAD_TARGET_TYPE_HSA. * libgomp.h (gomp_target_task): New fields firstprivate_copies and args. (bool gomp_create_target_task): Updated. (gomp_device_descr): Extra parameter of run_func and async_run_func, new field can_run_func. * libgomp_g.h (GOMP_target_ext): Update prototype. * oacc-host.c (host_run): Added a new parameter args. * target.c (calculate_firstprivate_requirements): New function. (copy_firstprivate_data): Likewise. (gomp_target_fallback_firstprivate): Use them. (gomp_target_unshare_firstprivate): New function. (gomp_get_target_fn_addr): Allow returning NULL for shared memory devices. (GOMP_target): Do host fallback for all shared memory devices. Do not pass any args to plugins. (GOMP_target_ext): Introduce device-specific argument parameter args. Allow host fallback if device shares memory. Do not remap data if device has shared memory. (gomp_target_task_fn): Likewise. Also treat shared memory devices like host fallback for mappings. (GOMP_target_data): Treat shared memory devices like host fallback. (GOMP_target_data_ext): Likewise. (GOMP_target_update): Likewise. (GOMP_target_update_ext): Likewise. Also pass NULL as args to gomp_create_target_task. (GOMP_target_enter_exit_data): Likewise. (omp_target_alloc): Treat shared memory devices like host fallback. (omp_target_free): Likewise. (omp_target_is_present): Likewise. (omp_target_memcpy): Likewise. (omp_target_memcpy_rect): Likewise. (omp_target_associate_ptr): Likewise. (gomp_load_plugin_for_device): Also load can_run. * task.c (GOMP_PLUGIN_target_task_completion): Free firstprivate_copies. (gomp_create_target_task): Accept new argument args and store it to ttask. liboffloadmic/plugin * libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_async_run): New unused parameter. (GOMP_OFFLOAD_run): Likewise. diff --git a/include/gomp-constants.h b/include/gomp-constants.h index dffd631..a8e7723 100644 --- a/include/gomp-constants.h +++ b/include/gomp-constants.h @@ -176,6 +176,7 @@ enum gomp_map_kind #define GOMP_DEVICE_NOT_HOST 4 #define GOMP_DEVICE_NVIDIA_PTX 5 #define GOMP_DEVICE_INTEL_MIC 6 +#define GOMP_DEVICE_HSA7 #define GOMP_DEVICE_ICV-1 #define GOMP_DEVICE_HOST_FALLBACK -2 @@ -201,6 +202,7 @@ enum gomp_map_kind #define GOMP_VERSION 0 #define GOMP_VERSION_NVIDIA_PTX 1 #define GOMP_VERSION_INTEL_MIC 0 +#define GOMP_VERSION_HSA 0 #define GOMP_VERSION_PACK(LIB, DEV) (((LIB) << 16) | (DEV)) #define GOMP_VERSION_LIB(PACK) (((PACK) >> 16) & 0x) @@ -228,4 +230,30 @@ enum gomp_map_kind #define GOMP_LAUNCH_OP(X) (((X) >> GOMP_LAUNCH_OP_SHIFT) & 0x) #define GOMP_LAUNCH_OP_MAX 0x +/* Bitmask to apply in order to find out the intended device of a target + argument. */ +#define GOMP_TARGET_ARG_DEVICE_MASK((1 << 7) - 1) +/* The target argument is significant for all devices. */ +#define GOMP_TARGET_ARG_DEVICE_ALL 0 + +/* Flag set when the subsequent element in the device-specific argument + values. */ +#define GOMP_TARGET_ARG_SUBSEQUENT_PARAM (1 << 7) + +/* Bitmask to apply to a target argument to find out the value identifier. */ +#define GOMP_TARGET_ARG_ID_MASK(((1 << 8) - 1) << 8) +/* Target argument index of NUM_TEAMS. */ +#define GOMP_TARGET_ARG_NUM_TEAMS (1 << 8) +/* Target argument index of THREAD_LIMIT. */ +#define GOMP_TARGET_ARG_THREAD_LIMIT (2 << 8) + +/* If the value is directly embeded in target argument, it should be a
[hsa merge 07/10] IPA-HSA pass
Hi, this patch contains IPA-related changes that we need to bring about for HSA. The patch is a re-post of https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00720.html but so far we have not received any feedback. Let me quote the original accompanying email here for reference: When a target construct is gridified, the HSA GPU function is associated with the CPU function throughout the compilation, so that they can be registered as a pair in libgomp. Ungridified target constructs and, more importantly, "pragma omp declare target" marked functions emerge out of OMP expansion as one gimple function for both the host and the accelerator. However, at some point we need to create a special HSA function representation so that we can modify behavior of a (very) few optimization passes for them. Both is done by the following new IPA pass, which creates new HSA clones in these cases. Moreover, it redirects the appropriate call graph edges to be in between HSA implementations, marks HSA clones with the flatten attribute to minimize any call overhead (which is much more significant on GPUs) and makes sure both the CPU and GPU functions are coupled together and remain in the same LTO partition so that they can b registered together to libgomp. Thanks, Martin 2016-01-13 Martin Liska Martin Jambor * ipa-hsa.c: New file. * lto-section-in.c (lto_section_name): Add hsa section name. * lto-streamer.h (lto_section_type): Add hsa section. * lto-partition.c: Include "hsa.h" (add_symbol_to_partition_1): Put hsa implementations into the same partition as host implementations. * timevar.def (TV_IPA_HSA): New. diff --git a/gcc/ipa-hsa.c b/gcc/ipa-hsa.c new file mode 100644 index 000..dd47995 --- /dev/null +++ b/gcc/ipa-hsa.c @@ -0,0 +1,329 @@ +/* Callgraph based analysis of static variables. + Copyright (C) 2015-2016 Free Software Foundation, Inc. + Contributed by Martin Liska + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +<http://www.gnu.org/licenses/>. */ + +/* Interprocedural HSA pass is responsible for creation of HSA clones. + For all these HSA clones, we emit HSAIL instructions and pass processing + is terminated. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "tm.h" +#include "is-a.h" +#include "hash-set.h" +#include "vec.h" +#include "tree.h" +#include "tree-pass.h" +#include "function.h" +#include "basic-block.h" +#include "gimple.h" +#include "dumpfile.h" +#include "gimple-pretty-print.h" +#include "tree-streamer.h" +#include "stringpool.h" +#include "cgraph.h" +#include "print-tree.h" +#include "symbol-summary.h" +#include "hsa.h" + +namespace { + +/* If NODE is not versionable, warn about not emiting HSAIL and return false. + Otherwise return true. */ + +static bool +check_warn_node_versionable (cgraph_node *node) +{ + if (!node->local.versionable) +{ + warning_at (EXPR_LOCATION (node->decl), OPT_Whsa, + "could not emit HSAIL for function %s: function cannot be " + "cloned", node->name ()); + return false; +} + return true; +} + +/* The function creates HSA clones for all functions that were either + marked as HSA kernels or are callable HSA functions. Apart from that, + we redirect all edges that come from an HSA clone and end in another + HSA clone to connect these two functions. */ + +static unsigned int +process_hsa_functions (void) +{ + struct cgraph_node *node; + + if (hsa_summaries == NULL) +hsa_summaries = new hsa_summary_t (symtab); + + FOR_EACH_DEFINED_FUNCTION (node) +{ + hsa_function_summary *s = hsa_summaries->get (node); + + /* A linked function is skipped. */ + if (s->m_binded_function != NULL) + continue; + + if (s->m_kind != HSA_NONE) + { + if (!check_warn_node_versionable (node)) + continue; + cgraph_node *clone = node->create_virtual_clone + (vec (), NULL, NULL, "hsa"); + TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl); + + clone->force_output = true; + hsa_summari
[hsa merge 04/10] Avoid extraneous remapping in copy_gimple_seq_and_replace_locals
Hi, this patch is new, it addresses a problem I outlined in https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00424.html and it is an implementation of Jakub's suggestion in https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00614.html I have refrained from bigger changes in struct copy_body_data in tree-inline.h as I think that such a cleanup should be done separately, but the structure could probably use some field-re ordering to remove padding. I hope I have grasped it correctly and that the patch is OK for trunk. Thanks, Martin 2016-01-13 Martin Jambor * tree-inline.c (remap_decl): Use existing dclarations if remapping a type and prevent_decl_creation_for_types. (replace_locals_stmt): Do an initial remapping of non-VLA typed decls first. Do real remapping with prevent_decl_creation_for_types set. * tree-inline.h (copy_body_data): New field prevent_decl_creation_for_types, moved remap_var_for_cilk to avoid padding. diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c index 6bf2467..7b34288 100644 --- a/gcc/tree-inline.c +++ b/gcc/tree-inline.c @@ -340,8 +340,22 @@ remap_decl (tree decl, copy_body_data *id) return decl; } - /* If we didn't already have an equivalent for this declaration, - create one now. */ + /* When remapping a type within copy_gimple_seq_and_replace_locals, all + necessary DECLs have already been remapped and we do not want to duplicate + a decl coming from outside of the sequence we are copying. */ + if (!n + && id->prevent_decl_creation_for_types + && id->remapping_type_depth > 0 + && (VAR_P (decl) || TREE_CODE (decl) == PARM_DECL)) +{ + if (id->do_not_unshare) + return decl; + else + return unshare_expr (decl); +} + + /* If we didn't already have an equivalent for this declaration, create one + now. */ if (!n) { /* Make a copy of the variable or label. */ @@ -5225,8 +5239,19 @@ replace_locals_stmt (gimple_stmt_iterator *gsip, /* This will remap a lot of the same decls again, but this should be harmless. */ if (gimple_bind_vars (stmt)) - gimple_bind_set_vars (stmt, remap_decls (gimple_bind_vars (stmt), -NULL, id)); + { + tree old_var, decls = gimple_bind_vars (stmt); + + for (old_var = decls; old_var; old_var = DECL_CHAIN (old_var)) + if (!can_be_nonlocal (old_var, id) + && ! variably_modified_type_p (TREE_TYPE (old_var), id->src_fn)) + remap_decl (old_var, id); + + gcc_checking_assert (!id->prevent_decl_creation_for_types); + id->prevent_decl_creation_for_types = true; + gimple_bind_set_vars (stmt, remap_decls (decls, NULL, id)); + id->prevent_decl_creation_for_types = false; + } } /* Keep iterating. */ diff --git a/gcc/tree-inline.h b/gcc/tree-inline.h index d3e5229..4cc1f19 100644 --- a/gcc/tree-inline.h +++ b/gcc/tree-inline.h @@ -140,14 +140,17 @@ struct copy_body_data the originals have been mapped to a value rather than to a variable. */ hash_map *debug_map; - - /* Cilk keywords currently need to replace some variables that - ordinary nested functions do not. */ - bool remap_var_for_cilk; /* A map from the inlined functions dependence info cliques to equivalents in the function into which it is being inlined. */ hash_map *dependence_map; + + /* Cilk keywords currently need to replace some variables that + ordinary nested functions do not. */ + bool remap_var_for_cilk; + + /* Do not create new declarations when within type remapping. */ + bool prevent_decl_creation_for_types; }; /* Weights of constructions for estimate_num_insns. */
[hsa merge 03/10] HSA libgomp plugin
Hi, the patch below adds the HSA-specific plugin for libgomp. The plugin implements the interface mandated by libgomp and takes care of finding any available HSA devices, finalizing HSAIL code and running it on HSA-capable GPUs. This patch is a re-post of https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00716.html with a number of modifications requested by Jakub. Thanks, Martin 2016-01-13 Martin Jambor Martin Liska * plugin/plugin-hsa.c: New file. diff --git a/libgomp/plugin/plugin-hsa.c b/libgomp/plugin/plugin-hsa.c new file mode 100644 index 000..d888493 --- /dev/null +++ b/libgomp/plugin/plugin-hsa.c @@ -0,0 +1,1493 @@ +/* Plugin for HSAIL execution. + + Copyright (C) 2013-2016 Free Software Foundation, Inc. + + Contributed by Martin Jambor and + Martin Liska . + + This file is part of the GNU Offloading and Multi Processing Library + (libgomp). + + Libgomp is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY + WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS + FOR A PARTICULAR PURPOSE. See the GNU General Public License for + more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + <http://www.gnu.org/licenses/>. */ + +#include +#include +#include +#include +#include +#include +#include +#include "libgomp-plugin.h" +#include "gomp-constants.h" + +/* Keep the following GOMP prefixed structures in sync with respective parts of + the compiler. */ + +/* Structure describing the run-time and grid properties of an HSA kernel + lauch. */ + +struct GOMP_kernel_launch_attributes +{ + /* Number of dimensions the workload has. Maximum number is 3. */ + uint32_t ndim; + /* Size of the grid in the three respective dimensions. */ + uint32_t gdims[3]; + /* Size of work-groups in the respective dimensions. */ + uint32_t wdims[3]; +}; + +/* Collection of information needed for a dispatch of a kernel from a + kernel. */ + +struct GOMP_hsa_kernel_dispatch +{ + /* Pointer to a command queue associated with a kernel dispatch agent. */ + void *queue; + /* Pointer to reserved memory for OMP data struct copying. */ + void *omp_data_memory; + /* Pointer to a memory space used for kernel arguments passing. */ + void *kernarg_address; + /* Kernel object. */ + uint64_t object; + /* Synchronization signal used for dispatch synchronization. */ + uint64_t signal; + /* Private segment size. */ + uint32_t private_segment_size; + /* Group segment size. */ + uint32_t group_segment_size; + /* Number of children kernel dispatches. */ + uint64_t kernel_dispatch_count; + /* Debug purpose argument. */ + uint64_t debug; + /* Levels-var ICV. */ + uint64_t omp_level; + /* Kernel dispatch structures created for children kernel dispatches. */ + struct GOMP_hsa_kernel_dispatch **children_dispatches; + /* Number of threads. */ + uint32_t omp_num_threads; +}; + +/* Part of the libgomp plugin interface. Return the name of the accelerator, + which is "hsa". */ + +const char * +GOMP_OFFLOAD_get_name (void) +{ + return "hsa"; +} + +/* Part of the libgomp plugin interface. Return the specific capabilities the + HSA accelerator have. */ + +unsigned int +GOMP_OFFLOAD_get_caps (void) +{ + return GOMP_OFFLOAD_CAP_SHARED_MEM | GOMP_OFFLOAD_CAP_OPENMP_400; +} + +/* Part of the libgomp plugin interface. Identify as HSA accelerator. */ + +int +GOMP_OFFLOAD_get_type (void) +{ + return OFFLOAD_TARGET_TYPE_HSA; +} + +/* Return the libgomp version number we're compatible with. There is + no requirement for cross-version compatibility. */ + +unsigned +GOMP_OFFLOAD_version (void) +{ + return GOMP_VERSION; +} + +/* Flag to decide whether print to stderr information about what is going on. + Set in init_debug depending on environment variables. */ + +static bool debug; + +/* Flag to decide if the runtime should suppress a possible fallback to host + execution. */ + +static bool suppress_host_fallback; + +/* Initialize debug and suppress_host_fallback according to the environment. */ + +static void +init_enviroment_variables (void) +{ + if (getenv ("HSA_DEBUG")) +debug = true; + else +debug = false; + + if (getenv ("HSA_SUPPRESS_HOST_FALLBACK")) +suppress_host_fallback = true; + else +suppress
[hsa merge 10/10] HSA register allocator
Hi, because HSA backend is not based on RTL,we need our own, and it is in this patch. The allocator has been written by Michael Matz and I have put it into a separate email so that I can add him to CC, because he is much better suited to answer any questions or review comments. Thanks, Martin 2016-01-13 Michael Matz Martin Jambor * hsa-regalloc.c: New file. diff --git a/gcc/hsa-regalloc.c b/gcc/hsa-regalloc.c new file mode 100644 index 000..5a42beb --- /dev/null +++ b/gcc/hsa-regalloc.c @@ -0,0 +1,719 @@ +/* HSAIL IL Register allocation and out-of-SSA. + Copyright (C) 2013-2016 Free Software Foundation, Inc. + Contributed by Michael Matz + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +<http://www.gnu.org/licenses/>. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "tm.h" +#include "is-a.h" +#include "vec.h" +#include "tree.h" +#include "dominance.h" +#include "cfg.h" +#include "cfganal.h" +#include "function.h" +#include "bitmap.h" +#include "dumpfile.h" +#include "cgraph.h" +#include "print-tree.h" +#include "cfghooks.h" +#include "symbol-summary.h" +#include "hsa.h" + + +/* Process a PHI node PHI of basic block BB as a part of naive out-f-ssa. */ + +static void +naive_process_phi (hsa_insn_phi *phi) +{ + unsigned count = phi->operand_count (); + for (unsigned i = 0; i < count; i++) +{ + gcc_checking_assert (phi->get_op (i)); + hsa_op_base *op = phi->get_op (i); + hsa_bb *hbb; + edge e; + + if (!op) + break; + + e = EDGE_PRED (phi->m_bb, i); + if (single_succ_p (e->src)) + hbb = hsa_bb_for_bb (e->src); + else + { + basic_block old_dest = e->dest; + hbb = hsa_init_new_bb (split_edge (e)); + + /* If switch insn used this edge, fix jump table. */ + hsa_bb *source = hsa_bb_for_bb (e->src); + hsa_insn_sbr *sbr; + if (source->m_last_insn + && (sbr = dyn_cast (source->m_last_insn))) + sbr->replace_all_labels (old_dest, hbb->m_bb); + } + + hsa_build_append_simple_mov (phi->m_dest, op, hbb); +} +} + +/* Naive out-of SSA. */ + +static void +naive_outof_ssa (void) +{ + basic_block bb; + + hsa_cfun->m_in_ssa = false; + + FOR_ALL_BB_FN (bb, cfun) + { +hsa_bb *hbb = hsa_bb_for_bb (bb); +hsa_insn_phi *phi; + +for (phi = hbb->m_first_phi; +phi; +phi = phi->m_next ? as_a (phi->m_next): NULL) + naive_process_phi (phi); + +/* Zap PHI nodes, they will be deallocated when everything else will. */ +hbb->m_first_phi = NULL; +hbb->m_last_phi = NULL; + } +} + +/* Return register class number for the given HSA TYPE. 0 means the 'c' one + bit register class, 1 means 's' 32 bit class, 2 stands for 'd' 64 bit class + and 3 for 'q' 128 bit class. */ + +static int +m_reg_class_for_type (BrigType16_t type) +{ + switch (type) +{ +case BRIG_TYPE_B1: + return 0; + +case BRIG_TYPE_U8: +case BRIG_TYPE_U16: +case BRIG_TYPE_U32: +case BRIG_TYPE_S8: +case BRIG_TYPE_S16: +case BRIG_TYPE_S32: +case BRIG_TYPE_F16: +case BRIG_TYPE_F32: +case BRIG_TYPE_B8: +case BRIG_TYPE_B16: +case BRIG_TYPE_B32: +case BRIG_TYPE_U8X4: +case BRIG_TYPE_S8X4: +case BRIG_TYPE_U16X2: +case BRIG_TYPE_S16X2: +case BRIG_TYPE_F16X2: + return 1; + +case BRIG_TYPE_U64: +case BRIG_TYPE_S64: +case BRIG_TYPE_F64: +case BRIG_TYPE_B64: +case BRIG_TYPE_U8X8: +case BRIG_TYPE_S8X8: +case BRIG_TYPE_U16X4: +case BRIG_TYPE_S16X4: +case BRIG_TYPE_F16X4: +case BRIG_TYPE_U32X2: +case BRIG_TYPE_S32X2: +case BRIG_TYPE_F32X2: + return 2; + +case BRIG_TYPE_B128: +case BRIG_TYPE_U8X16: +case BRIG_TYPE_S8X16: +case BRIG_TYPE_U16X8: +case BRIG_TYPE_S16X8: +case BRIG_TYPE_F16X8: +case BRIG_TYPE_U32X4: +case BRIG_TYPE_U64X2: +case BRIG_TYPE_S32X4: +case BRIG_TYPE_S64X2: +case BRIG_TYPE_F32X4: +case BRIG_TYPE_F64X2: + return 3; + +default: + gcc_unreachable (); +} +} + +/* If the Ith
[hsa merge 08/10] HSAIL BRIG description header file
Hi, the following patch adds a BRIG (binary representation of HSAIL) representation description. It is within a single header file describing the binary structures and constants of the format. The file comes from the HSA Foundation (I have only added the HSA_BRIG_FORMAT_H macro and check and removed some weird comments which are not present in proposed future versions of the file) and is licensed under "University of Illinois/NCSA Open Source License." The license is "GPL-compatible" according to FSF (http://www.gnu.org/licenses/license-list.en.html#GPLCompatibleLicenses) so I believe we can have it in GCC. Nevertheless, it is not GPL and there is no copyright assignment for it, but the situation is hopefully analogous to some other libraries that have their upstream elsewhere but we ship them as part of the GCC. In the previous posting of this patch (https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00721.html) I have requested a permission from the steering committee to include this file with a different upstream in GCC. I have not received an official reply but since I have been chosen to be the HSA maintainer, I tend to think there were no legal objections against HSA going forward, including this file. Thanks, Martin 2015-12-04 Martin Jambor * hsa-brig-format.h: New file. diff --git a/gcc/hsa-brig-format.h b/gcc/hsa-brig-format.h new file mode 100644 index 000..6e2fe75 --- /dev/null +++ b/gcc/hsa-brig-format.h @@ -0,0 +1,1277 @@ +// University of Illinois/NCSA +// Open Source License +// +// Copyright (c) 2013-2015, Advanced Micro Devices, Inc. +// All rights reserved. +// +// Developed by: +// +// HSA Team +// +// Advanced Micro Devices, Inc +// +// www.amd.com +// +// Permission is hereby granted, free of charge, to any person obtaining a copy of +// this software and associated documentation files (the "Software"), to deal with +// the Software without restriction, including without limitation the rights to +// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies +// of the Software, and to permit persons to whom the Software is furnished to do +// so, subject to the following conditions: +// +// * Redistributions of source code must retain the above copyright notice, +// this list of conditions and the following disclaimers. +// +// * Redistributions in binary form must reproduce the above copyright notice, +// this list of conditions and the following disclaimers in the +// documentation and/or other materials provided with the distribution. +// +// * Neither the names of the HSA Team, University of Illinois at +// Urbana-Champaign, nor the names of its contributors may be used to +// endorse or promote products derived from this Software without specific +// prior written permission. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS +// FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH THE +// SOFTWARE. + +#ifndef HSA_BRIG_FORMAT_H +#define HSA_BRIG_FORMAT_H + +typedef uint32_t BrigVersion32_t; + +enum BrigVersion { + +BRIG_VERSION_HSAIL_MAJOR = 1, +BRIG_VERSION_HSAIL_MINOR = 0, +BRIG_VERSION_BRIG_MAJOR = 1, +BRIG_VERSION_BRIG_MINOR = 0 +}; + +typedef uint8_t BrigAlignment8_t; + +typedef uint8_t BrigAllocation8_t; + +typedef uint8_t BrigAluModifier8_t; + +typedef uint8_t BrigAtomicOperation8_t; + +typedef uint32_t BrigCodeOffset32_t; + +typedef uint8_t BrigCompareOperation8_t; + +typedef uint16_t BrigControlDirective16_t; + +typedef uint32_t BrigDataOffset32_t; + +typedef BrigDataOffset32_t BrigDataOffsetCodeList32_t; + +typedef BrigDataOffset32_t BrigDataOffsetOperandList32_t; + +typedef BrigDataOffset32_t BrigDataOffsetString32_t; + +typedef uint8_t BrigExecutableModifier8_t; + +typedef uint8_t BrigImageChannelOrder8_t; + +typedef uint8_t BrigImageChannelType8_t; + +typedef uint8_t BrigImageGeometry8_t; + +typedef uint8_t BrigImageQuery8_t; + +typedef uint16_t BrigKind16_t; + +typedef uint8_t BrigLinkage8_t; + +typedef uint8_t BrigMachineModel8_t; + +typedef uint8_t BrigMemoryModifier8_t; + +typedef uint8_t BrigMemoryOrder8_t; + +typedef uint8_t BrigMemoryScope8_t; + +typedef uint16_t BrigOpcode16_t; + +typedef uint32_t BrigOperandOffset32_t; + +typedef uint8_t BrigPack8_t; + +typedef uint8_t BrigProfile8_t; + +typedef uint16_t BrigRegisterKind16_t; + +typedef uint8_t BrigRound8_t; + +typedef uint8_t BrigSamplerAddressing8_t; + +typedef uint8_t BrigSamplerCoordNormalization8_t; + +typedef uint8_t BrigSamplerFilter8_t; + +typedef uint8_t
[hsa merge 05/10] OpenMP lowering/expansion changes (gridification)
Hi, the patch in this email contains the changes to make our OpenMP lowering and expansion machinery produce GPU kernels for a certain limited class of loops. The following is a re-post of https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00718.html with a fair amount of incorporate feedback, almost all of which has been posted in https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01884.html. Thanks, Martin 2016-01-13 Martin Jambor gcc/ * builtin-types.def (BT_FN_VOID_UINT_PTR_INT_PTR): New. (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT): Removed. (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR): New. * gimple-low.c (lower_stmt): Also handle GIMPLE_OMP_GRID_BODY. * gimple-pretty-print.c (dump_gimple_omp_for): Also handle GF_OMP_FOR_KIND_GRID_LOOP. (dump_gimple_omp_block): Also handle GIMPLE_OMP_GRID_BODY. (pp_gimple_stmt_1): Likewise. * gimple-walk.c (walk_gimple_stmt): Likewise. * gimple.c (gimple_build_omp_grid_body): New function. (gimple_copy): Also handle GIMPLE_OMP_GRID_BODY. * gimple.def (GIMPLE_OMP_GRID_BODY): New. * gimple.h (enum gf_mask): Added GF_OMP_PARALLEL_GRID_PHONY, GF_OMP_FOR_KIND_GRID_LOOP, GF_OMP_FOR_GRID_PHONY and GF_OMP_TEAMS_GRID_PHONY. (gimple_statement_omp_single_layout): Updated comments. (gimple_build_omp_grid_body): New function. (gimple_has_substatements): Also handle GIMPLE_OMP_GRID_BODY. (gimple_omp_for_grid_phony): New function. (gimple_omp_for_set_grid_phony): Likewise. (gimple_omp_parallel_grid_phony): Likewise. (gimple_omp_parallel_set_grid_phony): Likewise. (gimple_omp_teams_grid_phony): Likewise. (gimple_omp_teams_set_grid_phony): Likewise. (gimple_return_set_retbnd): Also handle GIMPLE_OMP_GRID_BODY. * omp-builtins.def (BUILT_IN_GOMP_OFFLOAD_REGISTER): New. (BUILT_IN_GOMP_OFFLOAD_UNREGISTER): Likewise. (BUILT_IN_GOMP_TARGET): Updated type. * omp-low.c: Include symbol-summary.h, hsa.h and params.h. (adjust_for_condition): New function. (get_omp_for_step_from_incr): Likewise. (extract_omp_for_data): Moved parts to adjust_for_condition and get_omp_for_step_from_incr. (build_outer_var_ref): Handle GIMPLE_OMP_GRID_BODY. (fixup_child_record_type): Bail out if receiver_decl is NULL. (scan_sharing_clauses): Handle OMP_CLAUSE__GRIDDIM_. (scan_omp_parallel): Do not create child functions for phony constructs. (check_omp_nesting_restrictions): Handle GIMPLE_OMP_GRID_BODY. (scan_omp_1_op): Checking assert we are not remapping to ERROR_MARK. Also also handle GIMPLE_OMP_GRID_BODY. (parallel_needs_hsa_kernel_p): New function. (expand_parallel_call): Register apprpriate parallel child functions as HSA kernels. (grid_launch_attributes_trees): New type. (grid_attr_trees): New variable. (grid_create_kernel_launch_attr_types): New function. (grid_insert_store_range_dim): Likewise. (grid_get_kernel_launch_attributes): Likewise. (get_target_argument_identifier_1): Likewise. (get_target_argument_identifier): Likewise. (get_target_argument_value): Likewise. (push_target_argument_according_to_value): Likewise. (get_target_arguments): Likewise. (expand_omp_target): Call get_target_arguments instead of looking up for teams and thread limit. (grid_expand_omp_for_loop): New function. (grid_arg_decl_map): New type. (grid_remap_kernel_arg_accesses): New function. (grid_expand_target_kernel_body): New function. (expand_omp): Call it. (lower_omp_for): Do not emit phony constructs. (lower_omp_taskreg): Do not emit phony constructs but create for them a temporary variable receiver_decl. (lower_omp_taskreg): Do not emit phony constructs. (lower_omp_teams): Likewise. (lower_omp_grid_body): New function. (lower_omp_1): Call it. (grid_reg_assignment_to_local_var_p): New function. (grid_seq_only_contains_local_assignments): Likewise. (grid_find_single_omp_among_assignments_1): Likewise. (grid_find_single_omp_among_assignments): Likewise. (grid_find_ungridifiable_statement): Likewise. (grid_target_follows_gridifiable_pattern): Likewise. (grid_remap_prebody_decls): Likewise. (grid_copy_leading_local_assignments): Likewise. (grid_process_kernel_body_copy): Likewise. (grid_attempt_target_gridification): Likewise. (grid_gridify_all_targets_stmt): Likewise. (grid_gridify_all_targets): Likewise. (execute_lower_omp): Call grid_gridify_all_targets. (make_gimple_omp_edges): Handle GIMPLE_OMP_GRID_BODY. * tree-core.h (omp_clause_code): Added
Re: [hsa merge 08/10] HSAIL BRIG description header file
Hi, On Thu, Jan 14, 2016 at 05:18:56PM -0800, Ian Lance Taylor wrote: > Jakub Jelinek writes: > > > On Wed, Jan 13, 2016 at 06:39:33PM +0100, Martin Jambor wrote: > >> the following patch adds a BRIG (binary representation of HSAIL) > >> representation description. It is within a single header file > >> describing the binary structures and constants of the format. > >> > >> The file comes from the HSA Foundation (I have only added the > >> HSA_BRIG_FORMAT_H macro and check and removed some weird comments > >> which are not present in proposed future versions of the file) and is > >> licensed under "University of Illinois/NCSA Open Source License." > >> > >> The license is "GPL-compatible" according to FSF > >> (http://www.gnu.org/licenses/license-list.en.html#GPLCompatibleLicenses) > >> so I believe we can have it in GCC. Nevertheless, it is not GPL and > >> there is no copyright assignment for it, but the situation is > >> hopefully analogous to some other libraries that have their upstream > >> elsewhere but we ship them as part of the GCC. > >> > >> In the previous posting of this patch > >> (https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00721.html) I have > >> requested a permission from the steering committee to include this file > >> with a different upstream in GCC. I have not received an official > >> reply but since I have been chosen to be the HSA maintainer, I tend to > >> think there were no legal objections against HSA going forward, > >> including this file. > > Martin, could you ask the HSA Foundation or AMD or whoever if there is > any way they could remove the second requirement of the license? It > adds yet another case where anybody distributing GCC has to list yet > another copyright notice. I will raise this with the HSA PRM group and perhaps there is a slight chance that they will change this in the upcoming version of HSAIL. But it is not going to happen soon enough. > > Barring that, I would personally prefer that you write your own version > of this header file, defining the constants and structs that you need. > That's basically what we've done for ELF and COFF and Mach-O, several > times over. For example, libiberty/simple-object-elf.c. Well, if we have done something like this before, I can go through the exercise of copy'n'pasting everything from the PDF specification, if that allowed us to "own" the file and put it under GPL 3. But I must say I do not know. It is going to be a bit tedious job (and it would be good to double check I made no mistakes somehow) but it is certainly doable. I guess I will embark on it after going through the rest of the review (unless someone here tells me I should not, that is). > > Barring that, I agree with Jakub that this looks like something that > should go in the top-level include subdirectory rather than the gcc > subdirectory. Even if I "create" a copy of our own? But sure, no problem. Martin
Re: [hsa merge 05/10] OpenMP lowering/expansion changes (gridification)
Thanks Jakub and Alex, I have committed the following to the branch to address your comments: 2016-01-15 Martin Jambor * gimple.h: Fixed comment of gimple_statement_omp_single_layout * omp-low.c (get_target_argument_value): Fixed spelling in its comment. (push_target_argument_according_to_value): Likewise. * tree.h (OMP_CLAUSE_GRIDDIM_DIMENSION): Renamed to OMP_CLAUSE__GRIDDIM__DIMENSION --- gcc/gimple.h| 2 +- gcc/omp-low.c | 12 ++-- gcc/tree-pretty-print.c | 2 +- gcc/tree.h | 5 + 4 files changed, 9 insertions(+), 12 deletions(-) diff --git a/gcc/gimple.h b/gcc/gimple.h index 7eef07c..6d15dab 100644 --- a/gcc/gimple.h +++ b/gcc/gimple.h @@ -730,7 +730,7 @@ struct GTY((tag("GSS_OMP_CONTINUE"))) tree control_use; }; -/* GIMPLE_OMP_SINGLE, GIMPLE_OMP_ORDERED */ +/* GIMPLE_OMP_SINGLE, GIMPLE_OMP_TEAMS, GIMPLE_OMP_ORDERED */ struct GTY((tag("GSS_OMP_SINGLE_LAYOUT"))) gimple_statement_omp_single_layout : public gimple_statement_omp diff --git a/gcc/omp-low.c b/gcc/omp-low.c index c534f5c..616c5bd 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -12741,7 +12741,7 @@ grid_get_kernel_launch_attributes (gimple_stmt_iterator *gsi, if (OMP_CLAUSE_CODE (clause) != OMP_CLAUSE__GRIDDIM_) continue; - unsigned dim = OMP_CLAUSE_GRIDDIM_DIMENSION (clause); + unsigned dim = OMP_CLAUSE__GRIDDIM__DIMENSION (clause); max_dim = MAX (dim, max_dim); grid_insert_store_range_dim (gsi, lattrs, @@ -12788,7 +12788,7 @@ get_target_argument_identifier (int device, bool subseqent_param, int id) return fold_convert (ptr_type_node, t); } -/* Return a target argument consisiting of DEVICE identifier, value identifier +/* Return a target argument consisting of DEVICE identifier, value identifier ID, and the actual VALUE. */ static tree @@ -12806,8 +12806,8 @@ get_target_argument_value (gimple_stmt_iterator *gsi, int device, int id, } /* If VALUE is an integer constant greater than -2^15 and smaller than 2^15, - push one argument to ARGS with bot the DEVICE, ID and VALUE embeded in it, - otherwise push an iedntifier (with DEVICE and ID) and the VALUE in two + push one argument to ARGS with both the DEVICE, ID and VALUE embedded in it, + otherwise push an identifier (with DEVICE and ID) and the VALUE in two arguments. */ static void @@ -17693,7 +17693,7 @@ grid_attempt_target_gridification (gomp_target *target, ws = build_zero_cst (uint32_type_node); tree c = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE__GRIDDIM_); - OMP_CLAUSE_SET_GRIDDIM_DIMENSION (c, (unsigned int) i); + OMP_CLAUSE__GRIDDIM__DIMENSION (c) = i; OMP_CLAUSE__GRIDDIM__SIZE (c) = gs; OMP_CLAUSE__GRIDDIM__GROUP (c) = ws; OMP_CLAUSE_CHAIN (c) = gimple_omp_target_clauses (target); @@ -17749,7 +17749,7 @@ grid_gridify_all_targets (gimple_seq *body_p) memset (&wi, 0, sizeof (wi)); walk_gimple_seq_mod (body_p, grid_gridify_all_targets_stmt, NULL, &wi); } - + /* Main entry point. */ diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c index 31cea10..9c13d84 100644 --- a/gcc/tree-pretty-print.c +++ b/gcc/tree-pretty-print.c @@ -944,7 +944,7 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, int flags) case OMP_CLAUSE__GRIDDIM_: pp_string (pp, "_griddim_("); - pp_unsigned_wide_integer (pp, OMP_CLAUSE_GRIDDIM_DIMENSION (clause)); + pp_unsigned_wide_integer (pp, OMP_CLAUSE__GRIDDIM__DIMENSION (clause)); pp_colon (pp); dump_generic_node (pp, OMP_CLAUSE__GRIDDIM__SIZE (clause), spc, flags, false); diff --git a/gcc/tree.h b/gcc/tree.h index e885ea1..9b987bb 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -1636,12 +1636,9 @@ extern void protected_set_expr_location (tree, location_t); #define OMP_CLAUSE_TILE_LIST(NODE) \ OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_TILE), 0) -#define OMP_CLAUSE_GRIDDIM_DIMENSION(NODE) \ +#define OMP_CLAUSE__GRIDDIM__DIMENSION(NODE) \ (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__GRIDDIM_)\ ->omp_clause.subcode.dimension) -#define OMP_CLAUSE_SET_GRIDDIM_DIMENSION(NODE, DIMENSION) \ - (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__GRIDDIM_)\ - ->omp_clause.subcode.dimension = (DIMENSION)) #define OMP_CLAUSE__GRIDDIM__SIZE(NODE) \ OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__GRIDDIM_), 0) #define OMP_CLAUSE__GRIDDIM__GROUP(NODE) \ -- 2.6.4
Re: [hsa merge 07/10] IPA-HSA pass
On Thu, Jan 14, 2016 at 01:58:58PM +0100, Jakub Jelinek wrote: > Otherwise LGTM. > > Jakub Thanks Jakub, I have committed the following patch from Martin Liska that addresses your comments. Martin 2016-01-15 Martin Liska * ipa-hsa.c (process_hsa_functions): Fixed coding style. (ipa_hsa_read_section): Likewise. (ipa_hsa_read_section): Likewise. (pass_ipa_hsa::gate): Removed in_lto_p from the condition. --- gcc/ipa-hsa.c | 22 -- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/gcc/ipa-hsa.c b/gcc/ipa-hsa.c index dd47995..769657f 100644 --- a/gcc/ipa-hsa.c +++ b/gcc/ipa-hsa.c @@ -86,8 +86,9 @@ process_hsa_functions (void) { if (!check_warn_node_versionable (node)) continue; - cgraph_node *clone = node->create_virtual_clone - (vec (), NULL, NULL, "hsa"); + cgraph_node *clone + = node->create_virtual_clone (vec (), + NULL, NULL, "hsa"); TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl); clone->force_output = true; @@ -102,8 +103,9 @@ process_hsa_functions (void) { if (!check_warn_node_versionable (node)) continue; - cgraph_node *clone = node->create_virtual_clone - (vec (), NULL, NULL, "hsa"); + cgraph_node *clone + = node->create_virtual_clone (vec (), + NULL, NULL, "hsa"); TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl); if (!cgraph_local_p (node)) @@ -209,8 +211,8 @@ static void ipa_hsa_read_section (struct lto_file_decl_data *file_data, const char *data, size_t len) { - const struct lto_function_header *header = -(const struct lto_function_header *) data; + const struct lto_function_header *header += (const struct lto_function_header *) data; const int cfg_offset = sizeof (struct lto_function_header); const int main_offset = cfg_offset + header->cfg_size; const int string_offset = main_offset + header->main_size; @@ -221,9 +223,9 @@ ipa_hsa_read_section (struct lto_file_decl_data *file_data, const char *data, lto_input_block ib_main ((const char *) data + main_offset, header->main_size, file_data->mode_table); - data_in = -lto_data_in_create (file_data, (const char *) data + string_offset, - header->string_size, vNULL); + data_in += lto_data_in_create (file_data, (const char *) data + string_offset, + header->string_size, vNULL); count = streamer_read_uhwi (&ib_main); for (i = 0; i < count; i++) @@ -317,7 +319,7 @@ public: bool pass_ipa_hsa::gate (function *) { - return hsa_gen_requested_p () || in_lto_p; + return hsa_gen_requested_p (); } } // anon namespace -- 2.6.4
Re: [hsa merge 09/10] Majority of the HSA back-end
Hi, thanks Jakub. Below you'll find a patch, which is mostly work of Martin Liska, that should address all the review comments. We have then also went over the "XXX" marks (my bad that I forgot that Michael uses this mark), removed half of them and turned the rest into TODOs. Let me just quickly answer two comments as well: On Thu, Jan 14, 2016 at 03:05:33PM +0100, Jakub Jelinek wrote: > On Wed, Jan 13, 2016 at 06:39:34PM +0100, Martin Jambor wrote: > ... > > +#define HSA_WARN_MEMORY_ROUTINE "OpenMP device memory library routines > > have " \ > > + "undefined semantics within target regions, support for HSA ignores them" > > Well, if you don't support them in HSA target regions, you'd better punt and > not error on them. We don't error, apart from issuing a warning we basically ignore them. I believe we can do it even in the long term and that it is in fact useful because the standard says that the "effect" if these routines is "unspecified" if they get called from a target region. Perhaps this is even something we should warn about earlier in omp lowering/expansion. ... > > +unsigned > > +hsa_internal_fn::get_arity () > > +{ > > + switch (m_fn) > > +{ > > +case IFN_ACOS: > > +case IFN_ASIN: > > +case IFN_ATAN: > > +case IFN_COS: > > +case IFN_EXP: > > +case IFN_EXP10: > > +case IFN_EXP2: > > +case IFN_EXPM1: > > +case IFN_LOG: > > +case IFN_LOG10: > > +case IFN_LOG1P: > > +case IFN_LOG2: > > +case IFN_LOGB: > > +case IFN_SIGNIFICAND: > > +case IFN_SIN: > > +case IFN_SQRT: > > +case IFN_TAN: > > +case IFN_CEIL: > > +case IFN_FLOOR: > > +case IFN_NEARBYINT: > > +case IFN_RINT: > > +case IFN_ROUND: > > +case IFN_TRUNC: > > + return 1; > > +case IFN_ATAN2: > > +case IFN_COPYSIGN: > > +case IFN_FMOD: > > +case IFN_POW: > > +case IFN_REMAINDER: > > +case IFN_SCALB: > > +case IFN_LDEXP: > > + return 2; > > + break; > > +case IFN_CLRSB: > > +case IFN_CLZ: > > +case IFN_CTZ: > > +case IFN_FFS: > > +case IFN_PARITY: > > +case IFN_POPCOUNT: > > +default: > > + gcc_unreachable (); > > There are various other IFNs (e.g. for __builtin_{add,sub,mul}_overflow, > lots of others). How do you ensure you don't ICE on those? Martin added a comment explaining this. This can only be reached when we already know we are processing a known builtin, filtered by gen_hsa_insn_for_internal_fn_call. Thanks for looking at the code, Martin 2016-01-15 Martin Liska Martin Jambor * hsa-brig.c (struct function_linkage_pair): Fix GNU coding style and replace sprintf with snprintf. (hsa_brig_section::init): Likewise. (hsa_brig_section::output): Likewise. (hsa_brig_section::get_ptr_by_offset): Likewise. (brig_string_slot_hasher::hash): Likewise. (brig_string_slot_hasher::equal): Likewise. (brig_string_slot_hasher::remove): Likewise. (brig_emit_string): Likewise. (brig_init): Likewise. (emit_directive_variable): Likewise. (emit_function_directives): Likewise. (emit_bb_label_directive): Likewise. (emit_immediate_scalar_to_buffer): Likewise. (hsa_op_immed::emit_to_buffer): Likewise. (emit_immediate_operand): Likewise. (emit_address_operand): Likewise. (emit_memory_insn): Likewise. (emit_alloca_insn): Likewise. (emit_cmp_insn): Likewise. (emit_branch_insn): Likewise. (emit_switch_insn): Likewise. (emit_call_insn): Likewise. (emit_arg_block_insn): Likewise. (emit_packed_insn): Likewise. (emit_basic_insn): Likewise. (hsa_brig_emit_function): Likewise. (hsa_output_global_variables): Likewise. (hsa_output_kernels): Likewise. (hsa_output_libgomp_mapping): Likewise. (hsa_output_brig): Likewise. * hsa-dump.c (dump_hsa_immed): Likewise. (dump_hsa_insn_1): Likewise. * hsa-gen.c (hsa_symbol::total_byte_size): Likewise. (hsa_init_simple_builtins): Likewise. (hsa_init_data_for_cfun): Likewise. (hsa_type_for_scalar_tree_type): Likewise. (get_symbol_for_decl): Likewise. (hsa_get_host_function): Likewise. (hsa_op_immed::hsa_op_immed): Likewise. (hsa_insn_mem::hsa_insn_mem): Likewise. (hsa_insn_atomic::hsa_insn_atomic): Likewise. (hsa_insn_seg::hsa_insn_seg): Likewise. (hsa_insn_srctype::hsa_insn_srctype)
Re: [hsa merge 10/10] HSA register allocator
Hi, On Thu, Jan 14, 2016 at 03:41:34PM +0100, Jakub Jelinek wrote: > On Wed, Jan 13, 2016 at 06:39:35PM +0100, Martin Jambor wrote: > > +for (phi = hbb->m_first_phi; > > +phi; > > +phi = phi->m_next ? as_a (phi->m_next): NULL) > > Space before : > > Ok with that change. > I have committed the following patch from Martin to address this and a few other code style issues. Thanks, Martin 2016-01-15 Martin Liska * hsa-regalloc.c (naive_outof_ssa): Fixed coding style. (linear_scan_regalloc): Likewise. (regalloc): Likewise. --- gcc/hsa-regalloc.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/gcc/hsa-regalloc.c b/gcc/hsa-regalloc.c index 5a42beb..f8e83ecf 100644 --- a/gcc/hsa-regalloc.c +++ b/gcc/hsa-regalloc.c @@ -90,7 +90,7 @@ naive_outof_ssa (void) for (phi = hbb->m_first_phi; phi; -phi = phi->m_next ? as_a (phi->m_next): NULL) +phi = phi->m_next ? as_a (phi->m_next) : NULL) naive_process_phi (phi); /* Zap PHI nodes, they will be deallocated when everything else will. */ @@ -525,7 +525,7 @@ linear_scan_regalloc (struct m_reg_class_desc *classes) else after_end_number = insn_order; /* Everything live-out in this BB has at least an end point - after us. */ +after us. */ EXECUTE_IF_SET_IN_BITMAP (hbb->m_liveout, 0, bit, bi) note_lr_end (ind2reg[bit], after_end_number); @@ -549,7 +549,7 @@ linear_scan_regalloc (struct m_reg_class_desc *classes) } /* Everything live-in in this BB has a start point before - our first insn. */ +our first insn. */ int before_start_number; if (hbb->m_first_insn) before_start_number = hbb->m_first_insn->m_number; @@ -570,7 +570,7 @@ linear_scan_regalloc (struct m_reg_class_desc *classes) are defined at the start of the routine (prologue). */ if (ind2reg[i]->m_lr_begin == insn_order) ind2reg[i]->m_lr_begin = 0; - /* All regs that have no use but a def will have lr_end == 0, + /* All regs that have no use but a def will have lr_end == 0, they are actually live from def until after the insn they are defined in. */ if (ind2reg[i]->m_lr_end == 0) @@ -672,7 +672,7 @@ regalloc (void) basic_block bb; m_reg_class_desc classes[4]; - /* If there are no registers used in the function, exit right away. */ + /* If there are no registers used in the function, exit right away. */ if (hsa_cfun->m_reg_count == 0) return; -- 2.6.4
Re: [hsa merge 07/10] IPA-HSA pass
Hi, On Fri, Jan 15, 2016 at 04:01:49PM +0100, Jakub Jelinek wrote: > On Fri, Jan 15, 2016 at 03:53:23PM +0100, Martin Jambor wrote: > > @@ -317,7 +319,7 @@ public: > > bool > > pass_ipa_hsa::gate (function *) > > { > > - return hsa_gen_requested_p () || in_lto_p; > > + return hsa_gen_requested_p (); > > } > > > > } // anon namespace > > I actually didn't mean this, I mean more of: > return (hsa_gen_requested_p () > #ifdef ENABLE_HSA > || in_lto_p > #endif >); > or so. Unless you arrange in lto-wrapper or where that if > HSA is enabled in any LTO input source, then it is enabled also in > lto1. If you do that, your change is fine. > This pass only creates HSA specific clones of ungridified target and parallel regions and functions marked with declare target. Whether or not any HSAIL is emitted is then controlled in the hsa-gen pass gate. The in_lto_p part was in fact a relict of a previous implementation. So while I agree that making such a change to lto-wrapper would be beneficial (although then we should limit its activity only to those nodes which come from enabled units), the change above does not make the current situation worse. I will make sure to look into lto-wrapper but meanwhile I still prefer the new condition. We have tested the new change and LTO compiled code with HSA enabled and LTO linked it with HSA disabled and: 1) if there was no gridified loop, the result was like HSA was disabled from the start 2) if there was a gridified kernel, the compiler compiled the kernel for the host but did not register it with libgomp and it ended up as an unreachable function. How do other accelerators cope with the situation when half of the application is compiled with the accelerator disabled? (Would some of their calls to GOMP_target_ext lead to abort?) Martin
Re: [hsa merge 08/10] HSAIL BRIG description header file
On Fri, Jan 15, 2016 at 01:03:35PM +0100, Jakub Jelinek wrote: > On Fri, Jan 15, 2016 at 11:37:32AM +0100, Jakub Jelinek wrote: > > On Fri, Jan 15, 2016 at 11:14:33AM +0100, Martin Jambor wrote: > > > > Martin, could you ask the HSA Foundation or AMD or whoever if there is > > > > any way they could remove the second requirement of the license? It > > > > adds yet another case where anybody distributing GCC has to list yet > > > > another copyright notice. > > > > > > I will raise this with the HSA PRM group and perhaps there is a slight > > > chance that they will change this in the upcoming version of HSAIL. > > > But it is not going to happen soon enough. > > > > Under what license is > > http://www.hsafoundation.com/html/Content/PRM/Topics/18_BRIG/_chpStr_BRIG_HSAIL_binary_format.htm > > ? Sounds the same as the pdf to me. > > Unlike the pdf version thereof, you could grab the ... chunks > > out of this fairly easily with recursive wget and some quick scripting. > > E.g. > for i in `seq 2 123`; do sed > 's/\r$//;s/</]*>/\n\n/g;s/<\/pre>/\n<\/pre>\n/g;s/ name=[^>]*><\/a>//g' $i | sed -n '/^$/,/^<\/pre>$/{/^<.*pre>$/d;p}'; done > on downloaded (in the order of appearance in the toc) files, I get > following, which while it doesn't compile, I suppose some manual reordering > and if it is needed in C, also e.g. in case of typedef BrigModuleHeader* > BrigModule_t; adding > struct before BrigModuleHeader or turning that struct also into a typedef, > might make it work. > Now the question is if it covers all you care about. > Yes it does. We have massaged it just a little and it works fine (and the compiler is also also basically the same binary-wise). So we will go with the following hsa-brig-format.h (in its old location in gcc/). Thanks for this input, it really helped, Martin /* HSA BRIG (binary representation of HSAIL) 1.0.1 representation description. Copyright (C) 2016 Free Software Foundation, Inc. This file is part of GCC. GCC is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3, or (at your option) any later version. GCC is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with GCC; see the file COPYING3. If not see <http://www.gnu.org/licenses/>. The contents of the file was created by extracting data structures, enum, typedef and other definitions from HSA Programmer's Reference Manual Version 1.0.1 (http://www.hsafoundation.com/standards/). HTML version is provided on the following link: http://www.hsafoundation.com/html/Content/PRM/Topics/PRM_title_page.htm */ #ifndef HSA_BRIG_FORMAT_H #define HSA_BRIG_FORMAT_H struct BrigModuleHeader; typedef uint16_t BrigKind16_t; typedef uint32_t BrigVersion32_t; typedef BrigModuleHeader *BrigModule_t; typedef uint32_t BrigDataOffset32_t; typedef uint32_t BrigCodeOffset32_t; typedef uint32_t BrigOperandOffset32_t; typedef BrigDataOffset32_t BrigDataOffsetString32_t; typedef BrigDataOffset32_t BrigDataOffsetCodeList32_t; typedef BrigDataOffset32_t BrigDataOffsetOperandList32_t; typedef uint8_t BrigAlignment8_t; enum BrigAlignment { BRIG_ALIGNMENT_NONE = 0, BRIG_ALIGNMENT_1 = 1, BRIG_ALIGNMENT_2 = 2, BRIG_ALIGNMENT_4 = 3, BRIG_ALIGNMENT_8 = 4, BRIG_ALIGNMENT_16 = 5, BRIG_ALIGNMENT_32 = 6, BRIG_ALIGNMENT_64 = 7, BRIG_ALIGNMENT_128 = 8, BRIG_ALIGNMENT_256 = 9 }; typedef uint8_t BrigAllocation8_t; enum BrigAllocation { BRIG_ALLOCATION_NONE = 0, BRIG_ALLOCATION_PROGRAM = 1, BRIG_ALLOCATION_AGENT = 2, BRIG_ALLOCATION_AUTOMATIC = 3 }; typedef uint8_t BrigAluModifier8_t; enum BrigAluModifierMask { BRIG_ALU_FTZ = 1 }; typedef uint8_t BrigAtomicOperation8_t; enum BrigAtomicOperation { BRIG_ATOMIC_ADD = 0, BRIG_ATOMIC_AND = 1, BRIG_ATOMIC_CAS = 2, BRIG_ATOMIC_EXCH = 3, BRIG_ATOMIC_LD = 4, BRIG_ATOMIC_MAX = 5, BRIG_ATOMIC_MIN = 6, BRIG_ATOMIC_OR = 7, BRIG_ATOMIC_ST = 8, BRIG_ATOMIC_SUB = 9, BRIG_ATOMIC_WRAPDEC = 10, BRIG_ATOMIC_WRAPINC = 11, BRIG_ATOMIC_XOR = 12, BRIG_ATOMIC_WAIT_EQ = 13, BRIG_ATOMIC_WAIT_NE = 14, BRIG_ATOMIC_WAIT_LT = 15, BRIG_ATOMIC_WAIT_GTE = 16, BRIG_ATOMIC_WAITTIMEOUT_EQ = 17, BRIG_ATOMIC_WAITTIMEOUT_NE = 18, BRIG_ATOMIC_WAITTIMEOUT_LT = 19, BRIG_ATOMIC_WAITTIMEOUT_GTE = 20 }; struct BrigBase { uint16_t byteCount; BrigKind16_t kind; }; typedef uint8_t BrigCompareOperation8_t; enum BrigCompareOperation { BRIG_COMPARE_EQ = 0, BRIG_COMPARE_N
Re: [hsa merge 09/10] Majority of the HSA back-end
Hi, bootstrapping on i686-linux revealed the need for the following simple patch. I've run into two types of compilation errors on powerpc-ibm-aix (no htolenn functions and ASM_GENERATE_INTERNAL_LABEL somehow expanding to undeclared rs6000_xcoff_strip_dollar). I plan to workaround them quickly by making most of the contents of hsa-*.c files compiled only conditionally (and leave potential hsa support on non-linux platforms for later), but I will not have time to do the change and test it properly until Monday. But that will hopefully really be it, Martin 2016-01-16 Martin Jambor * hsa-dump.c (dump_hsa_symbol): Add missing argumet cast. diff --git a/gcc/hsa-dump.c b/gcc/hsa-dump.c index af79bcb..c5f1f69 100644 --- a/gcc/hsa-dump.c +++ b/gcc/hsa-dump.c @@ -720,7 +720,7 @@ dump_hsa_symbol (FILE *f, hsa_symbol *symbol) hsa_type_name (symbol->m_type & ~BRIG_TYPE_ARRAY_MASK), name); if (symbol->m_type & BRIG_TYPE_ARRAY_MASK) -fprintf (f, "[%lu]", symbol->m_dim); +fprintf (f, "[%lu]", (unsigned long) symbol->m_dim); } /* Dump textual representation of HSA IL operand OP to file F. */
Re: [hsa merge 08/10] HSAIL BRIG description header file
Hi, On Sat, Jan 16, 2016 at 12:43:07PM +0100, Jakub Jelinek wrote: > On Fri, Jan 15, 2016 at 06:23:05PM +0100, Martin Jambor wrote: > > BRIG_KIND_OPERAND_REGISTER = 0x300a, > > BRIG_KIND_OPERAND_STRING = 0x300b, > > BRIG_KIND_OPERAND_WAVESIZE = 0x3009c, > > BRIG_KIND_OPERAND_END = 0x300d > > The above looks weird, I'd have expected BRIG_KIND_OPERAND_WAVESIZE > to be 0x300c instead. Bug in the standard? > As typedef uint16_t BrigKind16_t;, I'm afraid this doesn't even fit > into the data type. Note the original brig header you've posted > had this fixed. > That is clearly a bug. We did not catch it whe comparing the compiler binary because we never use this constant. Have you found this by hand or did you do any more systematic comparison? BRIG is always validated when finalized and I belive that fortunately this particular bug would be caught by that as would majority of similar "random" ones. I am going to commit the following patch to the branch. Thanks for spotting this. Martin 2016-01-18 Martin Jambor * hsa-brig-format.h (BrigKind): Fix the value of BRIG_KIND_OPERAND_WAVESIZE. --- gcc/hsa-brig-format.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/hsa-brig-format.h b/gcc/hsa-brig-format.h index 247799b..e1c6cd2 100644 --- a/gcc/hsa-brig-format.h +++ b/gcc/hsa-brig-format.h @@ -303,7 +303,7 @@ enum BrigKind BRIG_KIND_OPERAND_OPERAND_LIST = 0x3009, BRIG_KIND_OPERAND_REGISTER = 0x300a, BRIG_KIND_OPERAND_STRING = 0x300b, - BRIG_KIND_OPERAND_WAVESIZE = 0x3009c, + BRIG_KIND_OPERAND_WAVESIZE = 0x300c, BRIG_KIND_OPERAND_END = 0x300d }; -- 2.6.4
Re: [hsa merge 09/10] Majority of the HSA back-end
Hi, On Sat, Jan 16, 2016 at 09:58:51AM +0100, Jakub Jelinek wrote: > On Sat, Jan 16, 2016 at 12:49:12AM +0100, Martin Jambor wrote: > > bootstrapping on i686-linux revealed the need for the following simple > > patch. I've run into two types of compilation errors on > > powerpc-ibm-aix (no htolenn functions and ASM_GENERATE_INTERNAL_LABEL > > somehow expanding to undeclared rs6000_xcoff_strip_dollar). I plan to > > workaround them quickly by making most of the contents of hsa-*.c > > files compiled only conditionally (and leave potential hsa support on > > non-linux platforms for later), but I will not have time to do the > > change and test it properly until Monday. > > > > But that will hopefully really be it, > > IMHO you'd be best to write your own helpers for conversion to little > endian (and back). > gcc configure already has AC_C_BIGENDIAN (dunno how it handles pdp endian > host though, so not sure if it is safe to rely on that), for recent GCC > you can use __BYTE_ORDER__ macro to check endianity and __builtin_bswap*. > So perhaps just > #if GCC_VERSION >= 4006 > // use __BYTE_ORDER__ and __builtin_bswap or nothing > #else > // provide a safe slower default, with shifts and masking > #endif > > As for rs6000_xcoff_strip_dollar, look at other sources that use it what > headers they do include, bet you want to #include "tm_p.h" to make it work. > thanks for the suggestion. With the following two patches, I can compile HSA branch on powerpc-aix. I'm going to prepare a new patch with them, bootstrap it on x86_64, i686 and ppc-aix and unless something new pops up again, I will commit it either at nigh today or early morning tomorrow. I have tested the slow paths of little endian conversion only very rudimentarily but I did. OTOH, I am actually not quite sure how 64 bit-wide numbers are spaced out on PDP-endian systems. But I guess it is OK to fix those only later if I am wrong. I am also willing to incorporate any feedback later, even if it is only a matter of style. Thanks, Martin 2016-01-18 Martin Jambor * hsa-brig.c: Include target.h and tm_p.h. --- gcc/hsa-brig.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c index 9260c21..ee06804 100644 --- a/gcc/hsa-brig.c +++ b/gcc/hsa-brig.c @@ -23,6 +23,8 @@ along with GCC; see the file COPYING3. If not see #include "system.h" #include "coretypes.h" #include "tm.h" +#include "target.h" +#include "tm_p.h" #include "is-a.h" #include "vec.h" #include "hash-table.h" -- 2.6.4 2016-01-18 Martin Jambor * hsa-brig.c (lendian16): New function. Changed all uses of htole16 to use it. (lendian32): New function. Changed all uses of htole32 to use it. (lendian64): New function. Changed all uses of htole64 to use it. --- gcc/hsa-brig.c | 412 ++--- 1 file changed, 245 insertions(+), 167 deletions(-) diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c index d4e644f..9260c21 100644 --- a/gcc/hsa-brig.c +++ b/gcc/hsa-brig.c @@ -44,6 +44,83 @@ along with GCC; see the file COPYING3. If not see #include "hsa.h" #include "gomp-constants.h" +/* Convert VAL to little endian form, if necessary. */ + +static uint16_t +lendian16 (uint16_t val) +{ +#if GCC_VERSION >= 4006 +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ + return val; +#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ + return __builtin_bswap16 (val); +#else /* __ORDER_PDP_ENDIAN__ */ + return val; +#endif +#else +// provide a safe slower default, with shifts and masking +#ifndef WORDS_BIGENDIAN + return val; +#else + return (val >> 8) | (val << 8); +#endif +#endif +} + +/* Convert VAL to little endian form, if necessary. */ + +static uint32_t +lendian32 (uint32_t val) +{ +#if GCC_VERSION >= 4006 +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ + return val; +#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ + return __builtin_bswap32 (val); +#else /* __ORDER_PDP_ENDIAN__ */ + return (val >> 16) | (val << 16); +#endif +#else +// provide a safe slower default, with shifts and masking +#ifndef WORDS_BIGENDIAN + return val; +#else + val = ((val & 0xff00ff00) >> 8) | ((val & 0xff00ff) << 8); + return (val >> 16) | (val << 16); +#endif +#endif +} + +/* Convert VAL to little endian form, if necessary. */ + +static uint64_t +lendian64 (uint64_t val) +{ +#if GCC_VERSION >= 4006 +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ + return val; +#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ + return __builtin_bswap64 (val); +#else /* __ORDER_PDP_ENDIAN__ */ + return (((val & 0x) << 48) + | ((val & 0x) << 16)
Re: [hsa merge 00/10] Merge of HSA branch
Hi, On Wed, Jan 13, 2016 at 06:39:25PM +0100, Martin Jambor wrote: > Hi, > > this is hopefully the last big re-post of the HSA patches... I have committed the combined patch as revision 232549 after bootstrapping and testing all languages on x86_64-linux and i686-linux and verifying I did not break powerpc-aix more than it was before. I will be updating gcc offloading wiki in a few days, meanwhile you can use README.hsa file from the branch: https://gcc.gnu.org/viewcvs/gcc/branches/hsa/gcc/README.hsa?view=markup I will be also posting followup testsuite patches. > > Thanks everybody for patience and feedback. While we are of course > opened for mor more of it, let's also hope the approval process will > finish soon as it should now. I can't but repeat my thanks, especially to Jakub for the review and help with the many last-minute issues. Martin
[PR 69355] Correct hole detection when total_scalarization fails
Hi, PR 69355 has revealed that when SRA attempts total scalarization of an aggregate but this fails because the user type-casts a scalar field and stores into a it a smaller aggregate (and the scalar field is not written to, whether directly or as a part of an aggregate store), the pass can loose track of unscalarized data there. I think that this can happen only when violating strict aliasing rules but with -fno-strict-aliasing it should work. Fixed thusly with the patch below (the condition is there to avoid detecting padding after aggregate-fields in totally-scalarized aggregates as unscalarized data). Bootstrapped and tested on x86_64-linux. OK for trunk? And the gcc-5 branch? Thanks, Martin 2016-01-26 Martin Jambor PR tree-optimization/69355 * tree-sra.c (analyze_access_subtree): Correct hole detection when total_scalarization fails. testsuite/ * gcc.dg/tree-ssa/pr69355.c: New test. --- gcc/testsuite/gcc.dg/tree-ssa/pr69355.c | 44 + gcc/tree-sra.c | 2 +- 2 files changed, 45 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr69355.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr69355.c b/gcc/testsuite/gcc.dg/tree-ssa/pr69355.c new file mode 100644 index 000..f515c21 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr69355.c @@ -0,0 +1,44 @@ +/* { dg-do run } */ +/* { dg-options "-O -fno-strict-aliasing" } */ + +struct S +{ + void *a; + long double b; +}; + +struct Z +{ + long long l; + short s; +} __attribute__((packed)); + +struct S __attribute__((noclone, noinline)) +foo (void *v, struct Z *z) +{ + struct S t; + t.a = v; + *(struct Z *) &t.b = *z; + return t; +} + +struct Z gz; + +int +main (int argc, char **argv) +{ + struct S s; + + if (sizeof (long double) < sizeof (struct Z)) +return 0; + + gz.l = 0xbeef; + gz.s = 0xab; + + s = foo ((void *) 0, &gz); + + if struct Z *) &s.b)->l != gz.l) + || (((struct Z *) &s.b)->s != gz.s)) +__builtin_abort (); + return 0; +} diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 740542f..b0e737a 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -2421,7 +2421,7 @@ analyze_access_subtree (struct access *root, struct access *parent, if (covered_to < limit) hole = true; - if (scalar) + if (scalar || !allow_replacements) root->grp_total_scalarization = 0; } -- 2.7.0
Re: [gomp4] Un-parallelized OpenACC kernels constructs with nvptx offloading: "avoid offloading"
On Fri, Jan 22, 2016 at 02:18:38PM +0100, Bernd Schmidt wrote: > On 01/22/2016 09:36 AM, Jakub Jelinek wrote: > > > >I think it is a bad idea to go against what the user wrote. Warning that > >some code might not be efficient? Perhaps (if properly guarded with some > >warning option one can turn off, either on a per-source file or using > >pragmas even more fine grained). But by default not offloading? That is > >just wrong. > > I'm leaning more towards Thomas' side of the argument. The kernels construct > is a hint, a "do your best" request to the compiler. If the compiler sees > that it can't parallelize a loop inside a kernels region, it's probably best > not to offload it. > Shouldn't such optimization feedback be output in MSG_NOTE dumps? Vectorizer uses it to inform the user what it is doing, supposedly with the intention to help the programmer find out why specific loops are not vectorized (and run slowly). I have also decided to use it to inform the user whether a combination of OpenMP constructs is gridified or not. Unfortunately, notes seem to appear only in "detailed" dumps, which often are not the best place for users to look into because of too much information on gcc internals. So the user interface aspect of notes could perhaps be re-thought a bit. In any event, I think that at least in the near term, good compiler feedback could ease the efficient use of accelerators quite a lot, like (they say) it did with early auto-vectorizing compilers. Martin
Re: [hsa merge 00/10] Merge of HSA branch
Hi, sorry for getting so late to this: On Thu, Jan 21, 2016 at 05:10:17PM -0600, Gerald Pfeifer wrote: > On Tue, 19 Jan 2016, Richard Biener wrote: > > I think the merge warrants a NEWS entry on gcc.gnu.org/ > > ...and gcc-6/changes.html. :-) > > Martin, happy to help. Want to propose some text (or even patch)? > So what would you think about the following? Perhaps it is too verbose but I wanted to mention the few areas users should know have changed, if they happen to try HSA out. I can certainly cut it down a bit. Index: changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v retrieving revision 1.52 diff -u -r1.52 changes.html --- changes.html25 Jan 2016 15:09:55 - 1.52 +++ changes.html27 Jan 2016 14:15:49 - @@ -272,6 +272,30 @@ +Heterogeneous Systems Architecture + + GCC can now generate HSAIL for simple OpenMP device constructs + if configured with --enable-offload-targets=hsa. A new + libgomp plugin then run these HSAIL kernels implementing these + constructs on HSA capable GPUs via standard HSA run-time. + + If the HSA compilation back-end determines it cannot output HSAIL + for a particular input, it gives a warning by default. These + warnings can be suppressed with -Wno-hsa. To give a + few examples, the HSA back-end does not implement compilation of + code using function pointers and variable-sized variables and + parameters, functions with variadic arguments as well as a number of + other less common programming constructs. + + When compilation for HSA is enabled, the compiler attempts to + compile composite OpenMP constructs + +#pragma omp target teams distribute parallel for +into parallel HSA GPU kernels. + + + + IA-32/x86-64 GCC now supports the Intel CPU named Skylake with AVX-512 extensions The change to the news on the main page might then be: Index: index.html === RCS file: /cvs/gcc/wwwdocs/htdocs/index.html,v retrieving revision 1.992 diff -u -r1.992 index.html --- index.html 24 Jan 2016 23:54:36 - 1.992 +++ index.html 27 Jan 2016 14:16:25 - @@ -52,6 +52,13 @@ + Heterogeneous Systems Architecture support in GCC + [2016-01-27] + http://www.hsafoundation.com/";> Heterogeneous Systems + Architecture 1.0 https://gcc.gnu.org/gcc-6/changes.html#hsa";> + support was added to GCC. Contributed by Martin Jambor, Martin Liška + and Michael Matz from SUSE. + GCC 5.3 released [2015-12-04] Any comments welcome. Thanks, Martin
Re: Martin Jambor appointed HSA Maintainer
Hi, On Fri, Dec 18, 2015 at 08:41:41AM -0500, David Edelsohn wrote: > I am pleased to announce that the GCC Steering Committee has > appointed Martin Jambor as HSA maintainer. > > Please join me in congratulating Martin on his new role. > Martin, please update your listing in the MAINTAINERS file. > thank you very much for your trust. I will do my best when carrying out the associated duties. Now that HSA is also in, I have committed the following change to the MAINTAINERS file. Martin 2016-01-29 Martin Jambor * MAINTAINERS (hsa maintainers): Add myself. --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index a5afeb7..aa757ea 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -209,6 +209,7 @@ fixincludes Bruce Korb *gimpl*Jason Merrill gcse.c Jeff Law global opt framework Jeff Law +hsa Martin Jambor jump.c David S. Miller web pages Gerald Pfeifer config.sub/config.guessBen Elliston -- 2.7.0
[hsa] Atomic assess memory model fixes
Hi, this is a followup to comments by Jakub and Richi on handling of memory models in HSA atomic operations: - I have made user-visible diagnostics lower case simple words, rather than constant identifiers. - I have added masking by MEMMODEL_BASE_MASK where appropriate. - I have made sure that warning code does not crash even when it encounters an unknown model and that it never warns multiple times. - I have fixed handling of atomic load operations which wrongly insisted on release semantics instead of acquire (apart from relaxed). - And last but not least, after looking at the respective documentations, I have convinced myself that __ATOMIC_SEQ_CST can be implemented using the HSA scacq, screl and scar memory orders, so I implemented that. Bootstrapped and tested on x86_64-linux. Since all of the above seems to be worth fixing and low risk, I am going to commit it trunk even at this stage, even though of course nothing in HSA is a regression. Thanks, Martin 2016-01-29 Martin Jambor * hsa-gen.c (get_memory_order_name): Mask with MEMMODEL_BASE_MASK. Use short lowercase names. (get_memory_order): Mask with MEMMODEL_BASE_MASK. Support MEMMODEL_CONSUME with acquire semantics and MEMMODEL_SEQ_CST with acq_rel one. Protect warning agains segfaults if get_memory_order_name returns NULL. (gen_hsa_ternary_atomic_for_builtin): Support with MEMMODEL_SEQ_CST with release semantics. Do not warn if get_memory_order already did. (gen_hsa_insns_for_call): Support with MEMMODEL_SEQ_CST with acquire semantics. Fix check for relaxed or acquire semantics. Do not warn if get_memory_order already did. --- gcc/hsa-gen.c | 59 --- 1 file changed, 40 insertions(+), 19 deletions(-) diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c index e8f80da..768c2cf 100644 --- a/gcc/hsa-gen.c +++ b/gcc/hsa-gen.c @@ -4415,20 +4415,20 @@ get_address_from_value (tree val, hsa_bb *hbb) static const char * get_memory_order_name (unsigned memmodel) { - switch (memmodel) + switch (memmodel & MEMMODEL_BASE_MASK) { case MEMMODEL_RELAXED: - return "__ATOMIC_RELAXED"; + return "relaxed"; case MEMMODEL_CONSUME: - return "__ATOMIC_CONSUME"; + return "consume"; case MEMMODEL_ACQUIRE: - return "__ATOMIC_ACQUIRE"; + return "acquire"; case MEMMODEL_RELEASE: - return "__ATOMIC_RELEASE"; + return "release"; case MEMMODEL_ACQ_REL: - return "__ATOMIC_ACQ_REL"; + return "acq_rel"; case MEMMODEL_SEQ_CST: - return "__ATOMIC_SEQ_CST"; + return "seq_cst"; default: return NULL; } @@ -4440,21 +4440,31 @@ get_memory_order_name (unsigned memmodel) static BrigMemoryOrder get_memory_order (unsigned memmodel, location_t location) { - switch (memmodel) + switch (memmodel & MEMMODEL_BASE_MASK) { case MEMMODEL_RELAXED: return BRIG_MEMORY_ORDER_RELAXED; +case MEMMODEL_CONSUME: + /* HSA does not have an equivalent, but we can use the slightly stronger +ACQUIRE. */ case MEMMODEL_ACQUIRE: return BRIG_MEMORY_ORDER_SC_ACQUIRE; case MEMMODEL_RELEASE: return BRIG_MEMORY_ORDER_SC_RELEASE; case MEMMODEL_ACQ_REL: +case MEMMODEL_SEQ_CST: + /* Callers implementing a simple load or store need to remove the release +or acquire part respectively. */ return BRIG_MEMORY_ORDER_SC_ACQUIRE_RELEASE; default: - HSA_SORRY_ATV (location, -"support for HSA does not implement memory model: %s", -get_memory_order_name (memmodel)); - return BRIG_MEMORY_ORDER_NONE; + { + const char *mmname = get_memory_order_name (memmodel); + HSA_SORRY_ATV (location, + "support for HSA does not implement the specified " + " memory model%s %s", + mmname ? ": " : "", mmname ? mmname : ""); + return BRIG_MEMORY_ORDER_NONE; + } } } @@ -4523,13 +4533,20 @@ gen_hsa_ternary_atomic_for_builtin (bool ret_orig, nops = 2; } - if (acode == BRIG_ATOMIC_ST && memorder != BRIG_MEMORY_ORDER_RELAXED - && memorder != BRIG_MEMORY_ORDER_SC_RELEASE) + if (acode == BRIG_ATOMIC_ST) { - HSA_SORRY_ATV (gimple_location (stmt), -"support for HSA does not implement memory model for " -"ATOMIC_ST: %s", get_memory_order_name (mmodel)); - return; + if (memorder == BRIG_MEMORY_ORDER_SC_ACQUIRE_RELEASE) + memorder = BRIG_MEMORY_ORDER_SC_RELEASE; + + if (memorder != BRIG_MEMORY_ORDER_RELAXED + &
Re: [hsa merge 00/10] Merge of HSA branch
Hi, On Thu, Jan 28, 2016 at 08:18:27AM -0700, Gerald Pfeifer wrote: > > This is okay with the changes/considering the questions above. > thanks for the feedback. I have committed the following after incorporating the comments. Martin Index: changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v retrieving revision 1.52 diff -u -r1.52 changes.html --- changes.html25 Jan 2016 15:09:55 - 1.52 +++ changes.html2 Feb 2016 14:09:11 - @@ -272,6 +272,30 @@ +Heterogeneous Systems Architecture + + GCC can now generate HSAIL (Heterogeneous System Architecture + Intermediate Language) for simple OpenMP device constructs if + configured with --enable-offload-targets=hsa. A new + libgomp plugin then runs the HSA GPU kernels implementing these + constructs on HSA capable GPUs via a standard HSA run time. + + If the HSA compilation back end determines it cannot output HSAIL + for a particular input, it gives a warning by default. These + warnings can be suppressed with -Wno-hsa. To give a few + examples, the HSA back end does not implement compilation of code + using function pointers, automatic allocation of variable sized + arrays, functions with variadic arguments as well as a number of + other less common programming constructs. + + When compilation for HSA is enabled, the compiler attempts to + compile composite OpenMP constructs + +#pragma omp target teams distribute parallel for +into parallel HSA GPU kernels. + + + IA-32/x86-64 GCC now supports the Intel CPU named Skylake with AVX-512 extensions Index: index.html === RCS file: /cvs/gcc/wwwdocs/htdocs/index.html,v retrieving revision 1.993 diff -u -r1.993 index.html --- index.html 30 Jan 2016 06:01:48 - 1.993 +++ index.html 2 Feb 2016 14:10:25 - @@ -50,6 +50,13 @@ News + Heterogeneous Systems Architecture support + [2016-01-27] + http://www.hsafoundation.com/";> Heterogeneous Systems + Architecture 1.0 https://gcc.gnu.org/gcc-6/changes.html#hsa";> + support was added to GCC, contributed by Martin Jambor, Martin Liška + and Michael Matz from SUSE. + GCC 5.3 released [2015-12-04]
[hsa branch] Map collapse(2) and collapse(3) to HSA grid dimensions
Hi, with HSA merged, the hsa branch can be used for development of new features again. Thus, I have committed there a patch which I finished after the merge proposal and thus I kept in a private branch so far, which allows collapse(2) and collapse(3) clauses to be gridified and the individual loops to be directly mapped to HSA grid dimensions. In order to achieve, that I needed to introduce hsa-specific builtins which expand to HSAIL instructions giving information about specific HSA grid dimensions. I hope I have done that right, any comments are welcome. Other than that, the changes are small because as I was restructuring the code, I was moving it in this direction for some time already. Committed to the branch (a few days ago actually, sorry for that). Thanks, Martin 2016-01-26 Martin Jambor gcc/ * Makefile.in (BUILTINS_DEF): Add hsa-builtins.def. * builtins.def: Include hsa-builtins.def. (DEF_HSA_BUILTIN): Define. * hsa-builtins.def: New file. * hsa-gen.c (query_hsa_grid): Accept dimension as an hsa_op_immed. Add a new override. (gen_hsa_insns_for_call): Handle BUILT_IN_HSA_GET_WORKITEM_ABSID. * omp-low.c (grid_get_kernel_launch_attributes): Support up to three dimensions. (grid_expand_omp_for_loop): Likewise. (lower_omp_for_lastprivate): Do not extract looptemps from grid loops. (grid_target_follows_gridifiable_pattern): Allow collapse up to 3. * tree-inline.h (copy_body_data): New field decl_creation_prevention_level. Moved remap_var_for_cilk to minimize padding. gcc/fortran/ * f95-lang.c: Include hsa-builtins.def. (DEF_HSA_BUILTIN): Define. libgomp/ * plugin/plugin-hsa.c (parse_target_attributes): Support up to three dimensions. (get_group_size): New function. (GOMP_OFFLOAD_run): Support up to three dimensions. diff --git a/gcc/Makefile.in b/gcc/Makefile.in index ab9cbbf..a996708 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -899,7 +899,8 @@ RTL_H = $(RTL_BASE_H) $(FLAGS_H) genrtl.h READ_MD_H = $(OBSTACK_H) $(HASHTAB_H) read-md.h PARAMS_H = params.h params-enum.h params.def BUILTINS_DEF = builtins.def sync-builtins.def omp-builtins.def \ - gtm-builtins.def sanitizer.def cilkplus.def cilk-builtins.def + gtm-builtins.def sanitizer.def cilkplus.def cilk-builtins.def \ + hsa-builtins.def INTERNAL_FN_DEF = internal-fn.def INTERNAL_FN_H = internal-fn.h $(INTERNAL_FN_DEF) TREE_CORE_H = tree-core.h coretypes.h all-tree.def tree.def \ diff --git a/gcc/builtins.def b/gcc/builtins.def index 2fc7f65..14d2335 100644 --- a/gcc/builtins.def +++ b/gcc/builtins.def @@ -188,6 +188,16 @@ along with GCC; see the file COPYING3. If not see || flag_cilkplus \ || flag_offload_abi != OFFLOAD_ABI_UNSET)) +#undef DEF_HSA_BUILTIN +#ifdef ENABLE_HSA +#define DEF_HSA_BUILTIN(ENUM, NAME, TYPE, ATTRS) \ + DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,\ + false, false, true, ATTRS, false, \ + (!flag_disable_hsa)) +#else +#define DEF_HSA_BUILTIN(ENUM, NAME, TYPE, ATTRS) +#endif + /* Builtin used by implementation of Cilk Plus. Most of these are decomposed by the compiler but a few are implemented in libcilkrts. */ #undef DEF_CILK_BUILTIN_STUB @@ -932,6 +942,9 @@ DEF_GCC_BUILTIN (BUILT_IN_LINE, "LINE", BT_FN_INT, ATTR_NOTHROW_LEAF_LIST) /* Offloading and Multi Processing builtins. */ #include "omp-builtins.def" +/* Heterogeneous Systems Architecture. */ +#include "hsa-builtins.def" + /* Cilk keywords builtins. */ #include "cilk-builtins.def" diff --git a/gcc/fortran/f95-lang.c b/gcc/fortran/f95-lang.c index 9c3a311..efa750de 100644 --- a/gcc/fortran/f95-lang.c +++ b/gcc/fortran/f95-lang.c @@ -1234,6 +1234,17 @@ gfc_init_builtin_functions (void) #undef DEF_GOMP_BUILTIN } +#ifdef ENABLE_HSA + if (!flag_disable_hsa) +{ +#undef DEF_HSA_BUILTIN +#define DEF_HSA_BUILTIN(code, name, type, attr) \ + gfc_define_builtin ("__builtin_" name, builtin_types[type], \ + code, name, attr); +#include "../hsa-builtins.def" +} +#endif + gfc_define_builtin ("__builtin_trap", builtin_types[BT_FN_VOID], BUILT_IN_TRAP, NULL, ATTR_NOTHROW_LEAF_LIST); TREE_THIS_VOLATILE (builtin_decl_explicit (BUILT_IN_TRAP)) = 1; diff --git a/gcc/hsa-builtins.def b/gcc/hsa-builtins.def new file mode 100644 index 000..e4681c1 --- /dev/null +++ b/gcc/hsa-builtins.def @@ -0,0 +1,31 @@ +/* This file contains the definitions and documentation for the + Offloading and Multi Processing builtins used in the GNU compiler. + Copyright (C) 2005-2015 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU Gene
Re: [RFC] Extend ipa-bitwise-cp with pointer alignment propagation
Hi, sorry, my main desktop disk has died (a slow but certain) death so I am not particularly responsive either. On Tue, Oct 04, 2016 at 12:37:38AM +0530, Prathamesh Kulkarni wrote: > On 22 September 2016 at 17:26, Jan Hubicka wrote: > > Yes, can you please verify that alignments it computes are monotonously > > worse than those your new code computes and include the removal in the > > next iteration of the patch? > >> > > Otherwise the patch seems fine to me (modulo Richard's comments) > I tried to verify the alignments are monotonously worse with the > attached patch (verify.diff), > which asserts that alignment lattice is not better than bits lattice > during each propagation > step in propagate_constants_accross_call(). > Does that look OK ? After propagation, here should be no TOP lattices anywhere. That would mean we have not delteted an unreachable node. Apart from that, yes. > > ipa-cp-alignment has better alignments than ipa-bit-cp in following cases: > > a) ipa_get_type() returns NULL: ipa-bits-cp sets lattice to bottom if > ipa_get_type (param) returns NULL, > for instance in case of K&R function, while ipa-cp-alignment doesn't What do you mean by "for instance?" What are the other cases when it happens? > look at param types, > and can propagate alignments. > The following assert: > if (bits_lattice.bottom_p ()) > gcc_assert (align_lattice.bottom_p()) > > triggered for 400.perlbench, 403.gcc, 456.hmmer and 481.wrf due to that is quite many more examples than I have anticipated, so they all used K&R? (But thanks for trying benchmarks diligently). Have also tried this with LTO? > ipa_get_type() > returning NULL. I am not really sure how to handle this case, since we > need to know parameter's > type during bits propagation for obtaining precision. > > b) This happens for attached test-case (test.i), > which is a reduced (and slightly modified) test-case from 458.sjeng. > Bits propagation sets lattice to bottom, while alignment propagation > propagates . yes, I agree we do not need to worry about the case when alignment is 1. I am only slightly concerned how often ipa_get_type is NULL, so it would be nice if you looked into those cases once more to make sure that we do not miss some bug or something that we could handle easily. But if it is only K&R, I think it is fine. Thanks, Martin
[hsa-branch 4/9] Add expansion of reciprocal of square root
Hi, this patch is a simple addition of reciprocal of square root gimple function into its HSAIL equivalent. Committed to the branch, queued for merge to trunk soon. Thanks, Martin 2016-10-03 Martin Jambor * hsa-gen.c (gen_hsa_insn_for_internal_fn_call): Also handle IFN_RSQRT. --- gcc/hsa-gen.c | 4 1 file changed, 4 insertions(+) diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c index deb2a07..efb87a0 100644 --- a/gcc/hsa-gen.c +++ b/gcc/hsa-gen.c @@ -5386,6 +5386,10 @@ gen_hsa_insn_for_internal_fn_call (gcall *stmt, hsa_bb *hbb) gen_hsa_unaryop_for_builtin (BRIG_OPCODE_SQRT, stmt, hbb); break; +case IFN_RSQRT: + gen_hsa_unaryop_for_builtin (BRIG_OPCODE_NRSQRT, stmt, hbb); + break; + case IFN_TRUNC: gen_hsa_unaryop_for_builtin (BRIG_OPCODE_TRUNC, stmt, hbb); break; -- 2.10.0
[hsa-branch 2/9] Lastprivate lowering for gridified kernels
Hi, this patch implements the lastprivate data sharing clauses of gridified OpenMP looping constructs. It adds code to construct a special condition to identify he "last" loop iteration using special HSA instructions, because that way we do not need information about all HSA dimensions conveyed from callers and could modify only a small fraction of the non-gridification code. On the gridification side, it creates group-segment copies of internal loop lastprivate variables as means to transfer the value from the "last" work-item to all work-items that then continue working with the value. Committed to the branch, queued for merge to trunk soon. Thanks, Martin 2016-10-03 Martin Jambor * gimple.h (GF_OMP_FOR_GRID_PHONY): Added comment. (GF_OMP_FOR_GRID_INTRA_GROUP): New. (gimple_omp_for_grid_phony): Added checking assert. (gimple_omp_for_set_grid_phony): Likewise. (gimple_omp_for_grid_intra_group): New function. (gimple_omp_for_set_grid_intra_group): Likewise. (gimple_omp_for_grid_group_iter): Added checking assert. (gimple_omp_for_set_grid_group_iter): Likewise. * omp-low.c (lower_lastprivate_clauses): Also handle predicates that are not simple comparisons. (grid_lastprivate_predicate): New function. (lower_omp_for_lastprivate): Generate conditions for gridified kernels. (lower_omp_for): Adjust phony predicate call. (grid_parallel_clauses_gridifiable): Allow lastprivate. (grid_inner_loop_gridifiable_p): Likewise. (grid_mark_tiling_loops): Generate copies of lastprivate variables to group variables. (grid_mark_tiling_parallels_and_loops): Create binds for bodies of a parallel statements. (grid_process_kernel_body_copy): Avoid reusing variable name. --- gcc/gimple.h | 36 + gcc/omp-low.c | 235 +- 2 files changed, 187 insertions(+), 84 deletions(-) diff --git a/gcc/gimple.h b/gcc/gimple.h index ce3a161..3e84e6b0 100644 --- a/gcc/gimple.h +++ b/gcc/gimple.h @@ -162,7 +162,12 @@ enum gf_mask { GF_OMP_FOR_KIND_CILKSIMD = GF_OMP_FOR_SIMD | 1, GF_OMP_FOR_COMBINED= 1 << 4, GF_OMP_FOR_COMBINED_INTO = 1 << 5, +/* The following flag must not be used on GF_OMP_FOR_KIND_GRID_LOOP loop + statements. */ GF_OMP_FOR_GRID_PHONY = 1 << 6, +/* The following two flags should only be set on GF_OMP_FOR_KIND_GRID_LOOP + loop statements. */ +GF_OMP_FOR_GRID_INTRA_GROUP= 1 << 6, GF_OMP_FOR_GRID_GROUP_ITER = 1 << 7, GF_OMP_TARGET_KIND_MASK= (1 << 4) - 1, GF_OMP_TARGET_KIND_REGION = 0, @@ -5123,6 +5128,8 @@ gimple_omp_for_set_pre_body (gimple *gs, gimple_seq pre_body) static inline bool gimple_omp_for_grid_phony (const gomp_for *omp_for) { + gcc_checking_assert (gimple_omp_for_kind (omp_for) + != GF_OMP_FOR_KIND_GRID_LOOP); return (gimple_omp_subcode (omp_for) & GF_OMP_FOR_GRID_PHONY) != 0; } @@ -5131,18 +5138,45 @@ gimple_omp_for_grid_phony (const gomp_for *omp_for) static inline void gimple_omp_for_set_grid_phony (gomp_for *omp_for, bool value) { + gcc_checking_assert (gimple_omp_for_kind (omp_for) + != GF_OMP_FOR_KIND_GRID_LOOP); if (value) omp_for->subcode |= GF_OMP_FOR_GRID_PHONY; else omp_for->subcode &= ~GF_OMP_FOR_GRID_PHONY; } +/* Return the kernel_intra_group of a GRID_LOOP OMP_FOR statement. */ + +static inline bool +gimple_omp_for_grid_intra_group (const gomp_for *omp_for) +{ + gcc_checking_assert (gimple_omp_for_kind (omp_for) + == GF_OMP_FOR_KIND_GRID_LOOP); + return (gimple_omp_subcode (omp_for) & GF_OMP_FOR_GRID_INTRA_GROUP) != 0; +} + +/* Set kernel_intra_group flag of OMP_FOR to VALUE. */ + +static inline void +gimple_omp_for_set_grid_intra_group (gomp_for *omp_for, bool value) +{ + gcc_checking_assert (gimple_omp_for_kind (omp_for) + == GF_OMP_FOR_KIND_GRID_LOOP); + if (value) +omp_for->subcode |= GF_OMP_FOR_GRID_INTRA_GROUP; + else +omp_for->subcode &= ~GF_OMP_FOR_GRID_INTRA_GROUP; +} + /* Return true if iterations of a grid OMP_FOR statement correspond to HSA groups. */ static inline bool gimple_omp_for_grid_group_iter (const gomp_for *omp_for) { + gcc_checking_assert (gimple_omp_for_kind (omp_for) + == GF_OMP_FOR_KIND_GRID_LOOP); return (gimple_omp_subcode (omp_for) & GF_OMP_FOR_GRID_GROUP_ITER) != 0; } @@ -5151,6 +5185,8 @@ gimple_omp_for_grid_group_iter (const gomp_for *omp_for) static inline void gimple_omp_for_set_grid_group_iter (gomp_for *omp_for, bool value) { + gcc_checking_assert (gimple_omp_for_kind (omp_for) + == GF_OMP_FOR_KIND_GRID_LOOP); if (value) omp_for->subcode |= G
[hsa-branch 3/9] Handle simds within gridified loops gracefully
Hi, this patch deals with simd constructs in gridified OpenMP loops. Standalone simds are dealt with by forcing the gridified copy to have OMP_CLAUSE_SAFELEN_EXPR of one, while simds which are a part of a combined construct with the gridified parallel loop are simply discarded. Committed to the branch, queued for merge to trunk soon. Thanks, Martin 2016-10-03 Martin Jambor * omp-low.c (grid_find_ungridifiable_statement): Do not bail out for simd loops. (grid_inner_loop_gridifiable_p): Likewise. (grid_process_grid_body): New function. (grid_eliminate_combined_simd_part): Likewise. (grid_mark_tiling_loops): Use it. Walk body of the loop with grid_process_grid_body. (grid_process_kernel_body_copy): Likewise. --- gcc/omp-low.c | 137 +++--- 1 file changed, 122 insertions(+), 15 deletions(-) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 05015bd..a51474b 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -17478,17 +17478,6 @@ grid_find_ungridifiable_statement (gimple_stmt_iterator *gsi, *handled_ops_p = true; wi->info = stmt; return error_mark_node; - -case GIMPLE_OMP_FOR: - if ((gimple_omp_for_kind (stmt) & GF_OMP_FOR_SIMD) - && gimple_omp_for_combined_into_p (stmt)) - { - *handled_ops_p = true; - wi->info = stmt; - return error_mark_node; - } - break; - default: break; } @@ -17614,10 +17603,6 @@ grid_inner_loop_gridifiable_p (gomp_for *gfor, grid_prop *grid) dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc, GRID_MISSED_MSG_PREFIX "the inner loop contains " "call to a noreturn function\n"); - else if (gimple_code (bad) == GIMPLE_OMP_FOR) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc, -GRID_MISSED_MSG_PREFIX "the inner loop contains " -"a simd construct\n"); else dump_printf_loc (MSG_MISSED_OPTIMIZATION, grid->target_loc, GRID_MISSED_MSG_PREFIX "the inner loop contains " @@ -18212,6 +18197,113 @@ grid_copy_leading_local_assignments (gimple_seq src, gimple_stmt_iterator *dst, return NULL; } +/* Statement walker function to make adjustments to statements within the + gridifed kernel copy. */ + +static tree +grid_process_grid_body (gimple_stmt_iterator *gsi, bool *handled_ops_p, + struct walk_stmt_info *) +{ + *handled_ops_p = false; + gimple *stmt = gsi_stmt (*gsi); + if (gimple_code (stmt) == GIMPLE_OMP_FOR + && (gimple_omp_for_kind (stmt) & GF_OMP_FOR_SIMD)) + { +gomp_for *loop = as_a (stmt); +tree clauses = gimple_omp_for_clauses (loop); +tree cl = find_omp_clause (clauses, OMP_CLAUSE_SAFELEN); +if (cl) + OMP_CLAUSE_SAFELEN_EXPR (cl) = integer_one_node; +else + { + tree c = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE_SAFELEN); + OMP_CLAUSE_SAFELEN_EXPR (c) = integer_one_node; + OMP_CLAUSE_CHAIN (c) = clauses; + gimple_omp_for_set_clauses (loop, c); + } + } + return NULL_TREE; +} + +/* Given a PARLOOP that is a normal for looping construct but also a part of a + combined construct with a simd loop, eliminate the simd loop. */ + +static void +grid_eliminate_combined_simd_part (gomp_for *parloop) +{ + struct walk_stmt_info wi; + + memset (&wi, 0, sizeof (wi)); + wi.val_only = true; + enum gf_mask msk = GF_OMP_FOR_SIMD; + wi.info = (void *) &msk; + walk_gimple_seq (gimple_omp_body (parloop), find_combined_for, NULL, &wi); + gimple *stmt = (gimple *) wi.info; + /* We expect that the SIMD id the only statement in the parallel loop. */ + gcc_assert (stmt + && gimple_code (stmt) == GIMPLE_OMP_FOR + && (gimple_omp_for_kind (stmt) == GF_OMP_FOR_SIMD) + && gimple_omp_for_combined_into_p (stmt) + && !gimple_omp_for_combined_p (stmt)); + gomp_for *simd = as_a (stmt); + + /* Copy over the iteration properties because the body refers to the index in + the bottmom-most loop. */ + unsigned i, collapse = gimple_omp_for_collapse (parloop); + gcc_checking_assert (collapse == gimple_omp_for_collapse (simd)); + for (i = 0; i < collapse; i++) +{ + gimple_omp_for_set_index (parloop, i, gimple_omp_for_index (simd, i)); + gimple_omp_for_set_initial (parloop, i, gimple_omp_for_initial (simd, i)); + gimple_omp_for_set_final (parloop, i, gimple_omp_for_final (simd, i)); + gimple_omp_for_set_incr (parloop, i, gimple_omp_for_incr (simd, i)); +} + + tree *tgt= gimple_omp_for_clauses_ptr (parloop); + while (*tgt) +tgt = &OMP_CLAUSE_CHAIN (*tgt); + + /
[hsa-branch 6/9] Expand FMA_EXPR to HSAIL
Hi, the following patch adds expansion of fused multiply and add to HSAIL. The scalar variant is straightforwardly converted to an HSAIL equivalent while any vector instance is expanded into separate multiplication and additions. Committed to the branch, queued for merge to trunk soon. Thanks, Martin 2016-10-03 Martin Jambor * hsa-gen.c (gen_hsa_insns_for_operation_assignment): Handle FMA_EXPR and ternary operators in general. Remove obsolete fallthrough comments. --- gcc/hsa-gen.c | 27 --- 1 file changed, 24 insertions(+), 3 deletions(-) diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c index ac83e9e..ad40087 100644 --- a/gcc/hsa-gen.c +++ b/gcc/hsa-gen.c @@ -3076,6 +3076,23 @@ gen_hsa_insns_for_operation_assignment (gimple *assign, hsa_bb *hbb) case NEGATE_EXPR: opcode = BRIG_OPCODE_NEG; break; +case FMA_EXPR: + /* There is a native HSA instruction for scalar FMAs but not for vector +ones. */ + if (TREE_CODE (TREE_TYPE (lhs)) == VECTOR_TYPE) + { + hsa_op_reg *dest + = hsa_cfun->reg_for_gimple_ssa (gimple_assign_lhs (assign)); + hsa_op_with_type *op1 = hsa_reg_or_immed_for_gimple_op (rhs1, hbb); + hsa_op_with_type *op2 = hsa_reg_or_immed_for_gimple_op (rhs2, hbb); + hsa_op_with_type *op3 = hsa_reg_or_immed_for_gimple_op (rhs3, hbb); + hsa_op_reg *tmp = new hsa_op_reg (dest->m_type); + gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp, op1, op2, hbb); + gen_hsa_binary_operation (BRIG_OPCODE_ADD, dest, tmp, op3, hbb); + return; + } + opcode = BRIG_OPCODE_MAD; + break; case MIN_EXPR: opcode = BRIG_OPCODE_MIN; break; @@ -3275,14 +3292,18 @@ gen_hsa_insns_for_operation_assignment (gimple *assign, hsa_bb *hbb) switch (rhs_class) { case GIMPLE_TERNARY_RHS: - gcc_unreachable (); + { + hsa_op_with_type *op3 = hsa_reg_or_immed_for_gimple_op (rhs3, hbb); + hsa_insn_basic *insn = new hsa_insn_basic (4, opcode, dest->m_type, dest, + op1, op2, op3); + hbb->append_insn (insn); + } return; - /* Fall through */ case GIMPLE_BINARY_RHS: gen_hsa_binary_operation (opcode, dest, op1, op2, hbb); break; - /* Fall through */ + case GIMPLE_UNARY_RHS: gen_hsa_unary_operation (opcode, dest, op1, hbb); break; -- 2.10.0
[hsa-branch 1/9] Builtins for gridsize and currentworkgroupsize
Hi, the patch below makes the griddim and currentworkgroupsize special HSA instructions available for omp lowering through a builtin. They are then used by subsequent patch to implement conditions determining the last iteration for the lastprivate OpenMP sharing clause. Committed to the branch, queued for merge to trunk soon. Thanks, Martin 2016-10-03 Martin Jambor * hsa-builtins.def (BUILT_IN_HSA_GRIDSIZE): New. (BUILT_IN_HSA_CURRENTWORKGROUPSIZE): Likewise. * hsa-gen.c (gen_hsa_insns_for_call): Handle BUILT_IN_HSA_GRIDSIZE. --- gcc/hsa-builtins.def | 4 gcc/hsa-gen.c| 6 ++ 2 files changed, 10 insertions(+) diff --git a/gcc/hsa-builtins.def b/gcc/hsa-builtins.def index dcd0c55..cc0409e 100644 --- a/gcc/hsa-builtins.def +++ b/gcc/hsa-builtins.def @@ -33,3 +33,7 @@ DEF_HSA_BUILTIN (BUILT_IN_HSA_WORKITEMID, "hsa_workitemid", BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_HSA_BUILTIN (BUILT_IN_HSA_WORKITEMABSID, "hsa_workitemabsid", BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST) +DEF_HSA_BUILTIN (BUILT_IN_HSA_GRIDSIZE, "hsa_gridsize", +BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST) +DEF_HSA_BUILTIN (BUILT_IN_HSA_CURRENTWORKGROUPSIZE, "hsa_currentworkgroupsize", +BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST) diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c index f63608c..deb2a07 100644 --- a/gcc/hsa-gen.c +++ b/gcc/hsa-gen.c @@ -5812,6 +5812,12 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb) case BUILT_IN_HSA_WORKITEMABSID: query_hsa_grid_dim (stmt, BRIG_OPCODE_WORKITEMABSID, hbb); break; +case BUILT_IN_HSA_GRIDSIZE: + query_hsa_grid_dim (stmt, BRIG_OPCODE_GRIDSIZE, hbb); + break; +case BUILT_IN_HSA_CURRENTWORKGROUPSIZE: + query_hsa_grid_dim (stmt, BRIG_OPCODE_CURRENTWORKGROUPSIZE, hbb); + break; case BUILT_IN_GOMP_BARRIER: hbb->append_insn (new hsa_insn_br (0, BRIG_OPCODE_BARRIER, BRIG_TYPE_NONE, -- 2.10.0
[hsa-branch 5/9] Properly detect variadic arguments
Hi, this patch from Martin properly detects some variadic calls which we have failed to detect before during expansion to HSAIL. Committed to the branch, queued for merge to trunk soon. Thanks, Martin 2016-10-03 Martin Liska Martin Jambor * hsa-gen.c (verify_function_arguments): Properly detect variadic arguments. --- gcc/hsa-gen.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c index efb87a0..ac83e9e 100644 --- a/gcc/hsa-gen.c +++ b/gcc/hsa-gen.c @@ -3444,13 +3444,14 @@ gen_hsa_insns_for_switch_stmt (gswitch *s, hsa_bb *hbb) static void verify_function_arguments (tree decl) { + tree type = TREE_TYPE (decl); if (DECL_STATIC_CHAIN (decl)) { HSA_SORRY_ATV (EXPR_LOCATION (decl), "HSA does not support nested functions: %D", decl); return; } - else if (!TYPE_ARG_TYPES (TREE_TYPE (decl))) + else if (!TYPE_ARG_TYPES (type) || stdarg_p (type)) { HSA_SORRY_ATV (EXPR_LOCATION (decl), "HSA does not support functions with variadic arguments " -- 2.10.0
[hsa-branch 7/9] Ignore prefetch builtin
Hi, this patch makes HSAIL expansion ignore prefetch built-ins. It is a bit less straightforward because we also need to handle cases where the call does not pass gimple_call_builtin_p test because of argument type mismatches. Committed to the branch, queued for merge to trunk soon. Thanks, Martin 2016-10-03 Martin Jambor * hsa-gen.c (gen_hsa_insns_for_call): Ignore prefetch builtin. --- gcc/hsa-gen.c | 8 1 file changed, 8 insertions(+) diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c index ad40087..8893a28 100644 --- a/gcc/hsa-gen.c +++ b/gcc/hsa-gen.c @@ -5530,6 +5530,12 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb) if (!gimple_call_builtin_p (stmt, BUILT_IN_NORMAL)) { tree function_decl = gimple_call_fndecl (stmt); + /* Prefetch pass can create type-mismatching prefetch builtin calls which +fail the gimple_call_builtin_p test above. Handle them here. */ + if (DECL_BUILT_IN_CLASS (function_decl) + && DECL_FUNCTION_CODE (function_decl) == BUILT_IN_PREFETCH) + return; + if (function_decl == NULL_TREE) { HSA_SORRY_AT (gimple_location (stmt), @@ -5962,6 +5968,8 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb) gen_hsa_alloca (call, hbb); break; } +case BUILT_IN_PREFETCH: + break; default: { gen_hsa_insns_for_direct_call (stmt, hbb); -- 2.10.0
[hsa-branch 8/9] Fail instead of calling an unknown GOMP builtin
Hi, this patch is a bit of a hack to make sure we do not emit calls to libgomp run-time functions which are not available at the HSA GPU side, such as run-time loop scheduling routines. If we fail at the caller side, we avoid issues with finalizer looking at calls to non-existing functions. Committed to the branch, queued for merge to trunk soon. Thanks, Martin 2016-10-03 Martin Jambor * hsa-gen.c (gen_hsa_insns_for_call): Fail when encountering a GOMP builtin that we cannot process ourselves. --- gcc/hsa-gen.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c index 8893a28..fd0dbcd 100644 --- a/gcc/hsa-gen.c +++ b/gcc/hsa-gen.c @@ -5972,7 +5972,15 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb) break; default: { - gen_hsa_insns_for_direct_call (stmt, hbb); + tree name_tree = DECL_NAME (fndecl); + const char *s = IDENTIFIER_POINTER (name_tree); + size_t len = strlen (s); + if (len > 4 && (strncmp (s, "__builtin_GOMP_", 15) == 0)) + HSA_SORRY_ATV (gimple_location (stmt), +"support for HSA does not implement GOMP function %s", +s); + else + gen_hsa_insns_for_direct_call (stmt, hbb); return; } } -- 2.10.0
[hsa-branch 9/9] Fix another finalizer type complaint
Hi, the subsequent patch deals with a finalizer error issued when we ave a register-register move of an HSAIL vector type. Apparently, such a move must obey the same rules as vector loads and stores. Committed to the branch, queued for merge to trunk soon. Thanks, Martin 2016-10-03 Martin Jambor * hsa-gen.c (hsa_build_append_simple_mov): Use mem_type_for_type. --- gcc/hsa-gen.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c index fd0dbcd..0b25f66 100644 --- a/gcc/hsa-gen.c +++ b/gcc/hsa-gen.c @@ -2227,8 +2227,10 @@ hsa_reg_or_immed_for_gimple_op (tree op, hsa_bb *hbb) void hsa_build_append_simple_mov (hsa_op_reg *dest, hsa_op_base *src, hsa_bb *hbb) { - hsa_insn_basic *insn = new hsa_insn_basic (2, BRIG_OPCODE_MOV, dest->m_type, -dest, src); + /* Moves of packed data between registers need to adhere to the same type + rules like when dealing with memory. */ + BrigType16_t tp = mem_type_for_type (dest->m_type); + hsa_insn_basic *insn = new hsa_insn_basic (2, BRIG_OPCODE_MOV, tp, dest, src); if (hsa_op_reg *sreg = dyn_cast (src)) gcc_assert (hsa_type_bit_size (dest->m_type) == hsa_type_bit_size (sreg->m_type)); -- 2.10.0
Re: [hsa] depend nowait support for target
On Mon, Nov 23, 2015 at 03:16:42PM +0100, Jakub Jelinek wrote: > On Mon, Nov 23, 2015 at 03:12:05PM +0100, Martin Jambor wrote: > > +/* Thread routine to run a kernel asynchronously. */ > > + > > +static void * > > +run_kernel_asynchronously (void *thread_arg) > > +{ > > + struct async_run_info *info = (struct async_run_info *) thread_arg; > > + int device = info->device; > > + void *tgt_fn = info->tgt_fn; > > + void *tgt_vars = info->tgt_vars; > > + void **args = info->args; > > + void *async_data = info->async_data; > > + > > + free (info); > > + GOMP_OFFLOAD_run (device, tgt_fn, tgt_vars, args); > > + GOMP_PLUGIN_target_task_completion (async_data); > > + return NULL; > > Is this just a temporary hack to work-around the missing task.c/target.c > support for plugins that need polling (calling some hook) to determine > completion of the tasks, or there is no way to tell HSA to spawn something > asynchronously? > Short term it is ok this way. Basically yes. There is no way to tell HSA-run time to be notified of kernel completion. If libgomp provides a way to poll the device, I'll gladly use that instead. > > > + int err = pthread_create (&pt, NULL, &run_kernel_asynchronously, info); > > + if (err != 0) > > +GOMP_PLUGIN_fatal ("HSA asynchronous thread creation failed: %s", > > + strerror (err)); > > + err = pthread_detach (pt); > > + if (err != 0) > > +GOMP_PLUGIN_fatal ("Failed to detach a thread to run HRA kernel " > > + "asynchronously: %s", strerror (err)); > > HSA instead of HRA? > Oh, thanks. Will fix. Martin
[hsa] omp_target_associate_ptr and omp_target_is_present on shared memory
Hi, when looking at why target-12.c and target-24.c in libgomp/testsuite/libgomp.c/, I found two other places in libgomp's target.c where shared-memory devices ought to be treated like the host. Committed to the branch. Thanks, Martin 2015-11-25 Martin Jambor libgomp/ * target.c (omp_target_associate_ptr): Return EINVAL for shared memory devices. (omp_target_is_present): Return 1 for shared memory devices. --- libgomp/target.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/libgomp/target.c b/libgomp/target.c index f8a9803..b453c0c 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -1922,7 +1922,8 @@ omp_target_is_present (void *ptr, int device_num) if (devicep == NULL) return 0; - if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)) + if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400) + || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) return 1; gomp_mutex_lock (&devicep->lock); @@ -2146,7 +2147,8 @@ omp_target_associate_ptr (void *host_ptr, void *device_ptr, size_t size, if (devicep == NULL) return EINVAL; - if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)) + if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400) + || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) return EINVAL; gomp_mutex_lock (&devicep->lock); -- 2.6.0
[hsa] Fix static local variable name conflict
Hi, the patch below makes libgomp/testsuite/libgomp.c/target-28.c pass on HSA, where it previously did not like the two static variables with the same name. Committed to the branch. Thanks, Martin 2015-11-25 Martin Jambor * hsa.c (hsa_get_declaration_name): Return ASM name for global variables. --- gcc/hsa.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/gcc/hsa.c b/gcc/hsa.c index 7c9e0f6..8ab5da7 100644 --- a/gcc/hsa.c +++ b/gcc/hsa.c @@ -700,6 +700,8 @@ hsa_get_declaration_name (tree decl) } else if (TREE_CODE (decl) == FUNCTION_DECL) return cgraph_node::get_create (decl)->asm_name (); + else if (TREE_CODE (decl) == VAR_DECL && is_global_var (decl)) +return IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)); else return IDENTIFIER_POINTER (DECL_NAME (decl)); -- 2.6.0
[hsa] Describe grid with target clauses
Hi, Jakub requested that I remove the grid description from new fields of the classes representing gimple omp statement and put them into special artificial clauses instead. This patch implement that, with one target clause per dimension (so up to three clauses) and each one describing both the grid size and group size along that dimension (hence the new clause type has two parameters). Committed to the branch, I will be preparing a new diff against the trunk shortly. Thanks, Martin 2015-11-30 Martin Jambor * gimple.c (gimple_omp_target_init_dimensions): Removed. * gimple.h (gimple_statement_omp_parallel_layout): Removed fields dimensions and kernel_dim. (gimple_omp_target_dimensions): Removed. (gimple_omp_target_grid_size): Likewise. (gimple_omp_target_grid_size_ptr): Likewise. (gimple_omp_target_set_grid_size): Likewise. (gimple_omp_target_workgroup_size): Likewise. (gimple_omp_target_workgroup_size_ptr): Likewise. (gimple_omp_target_set_workgroup_size): Likewise. * omp-low.c (scan_sharing_clauses): Handle OMP_CLAUSE__GRIDDIM_. (scan_omp_target): Do not scan kernel_dim. (region_needs_kernel_p): Use clauses to recognize gridified kernels. (get_kernel_launch_attributes): Generate launch attributes from clauses. (get_target_arguments): Use clauses to recognize gridified kernels. (expand_target_kernel_body): Likewise. (attempt_target_gridification): Record grid description into clauses. * tree-core.h (omp_clause_code): New element OMP_CLAUSE__GRIDDIM_. (tree_omp_clause): New subcode dimension. * tree-pretty-print.c (dump_omp_clause): Handle OMP_CLAUSE__GRIDDIM_. * tree.c (omp_clause_num_ops): Add number of opernads of OMP_CLAUSE__GRIDDIM_. (omp_clause_code_name): Add name of OMP_CLAUSE__GRIDDIM_. (walk_tree_1): Handle OMP_CLAUSE__GRIDDIM_. * tree.h (OMP_CLAUSE_GRIDDIM_DIMENSION): New. (OMP_CLAUSE_SET_GRIDDIM_DIMENSION): Likewise. (OMP_CLAUSE_GRIDDIM_SIZE): Likewise. (OMP_CLAUSE_GRIDDIM_GROUP): Likewise. --- gcc/gimple.c| 11 --- gcc/gimple.h| 82 - gcc/omp-low.c | 72 ++- gcc/tree-core.h | 9 +- gcc/tree-pretty-print.c | 12 gcc/tree.c | 5 ++- gcc/tree.h | 11 +++ 7 files changed, 79 insertions(+), 123 deletions(-) diff --git a/gcc/gimple.c b/gcc/gimple.c index d876e90..4658f29 100644 --- a/gcc/gimple.c +++ b/gcc/gimple.c @@ -1098,17 +1098,6 @@ gimple_build_omp_target (gimple_seq body, int kind, tree clauses) return p; } -/* Set dimensions of TARGET to NUM and allocate kernel_dim array of the - statement with the appropriate number of elements. */ - -void -gimple_omp_target_init_dimensions (gomp_target *target, size_t num) -{ - gcc_assert (num > 0); - target->dimensions = num; - target->kernel_dim = ggc_cleared_vec_alloc (num); -} - /* Build a GIMPLE_OMP_TEAMS statement. BODY is the sequence of statements that will be executed. diff --git a/gcc/gimple.h b/gcc/gimple.h index 14e6cf6..4c4c799 100644 --- a/gcc/gimple.h +++ b/gcc/gimple.h @@ -661,21 +661,7 @@ struct GTY((tag("GSS_OMP_PARALLEL_LAYOUT"))) Shared data argument. */ tree data_arg; - /* TODO: Revisit placement of the following two fields. On one hand, we - currently only use them on target construct. On the other, use on - parallel construct is also possible in the future. */ - /* [ WORD 11 ] */ - /* Number of elements in kernel_iter array. */ - size_t dimensions; - - /* [ WORD 12 ] */ - /* If target also contains a GPU kernel, it should be run with the - following grid sizes. */ - struct gimple_omp_target_grid_dim -* GTY((length ("%h.dimensions"))) kernel_dim; - - /* [ WORD 13 ] */ /* If set, this statement is part of a gridified kernel, its clauses need to be scanned and lowered but the statement should be discarded after lowering. */ @@ -1504,7 +1490,6 @@ gomp_sections *gimple_build_omp_sections (gimple_seq, tree); gimple *gimple_build_omp_sections_switch (void); gomp_single *gimple_build_omp_single (gimple_seq, tree); gomp_target *gimple_build_omp_target (gimple_seq, int, tree); -void gimple_omp_target_init_dimensions (gomp_target *, size_t); gomp_teams *gimple_build_omp_teams (gimple_seq, tree); gomp_atomic_load *gimple_build_omp_atomic_load (tree, tree); gomp_atomic_store *gimple_build_omp_atomic_store (tree); @@ -5683,73 +5668,6 @@ gimple_omp_target_set_data_arg (gomp_target *omp_target_stmt, omp_target_stmt->data_arg = data_arg; } -/* Return the number of dimensions of kernel grid. */ - -static inline size_t -gimple_omp_target_dimensions (gomp_target *omp_target_stmt) -{ - return omp_target_stmt->d
[hsa] Use proper accesses to gimple_omp_for
Hi, when looking at the attempt_target_gridification function I realized I forgot to to replace some of the early code with proper gimple statement access function calls. This patch addresses that. Committed to the branch. Thanks, Martin 2015-11-30 Martin Jambor * omp-low.c (attempt_target_gridification): Use proper access into iter array of the inner loop. --- gcc/omp-low.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 5933c60..bdf6539 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -17484,21 +17484,21 @@ attempt_target_gridification (gomp_target *target, gimple_stmt_iterator *gsi, size_t collapse = gimple_omp_for_collapse (inner_loop); for (size_t i = 0; i < collapse; i++) { - gimple_omp_for_iter iter = inner_loop->iter[i]; - walk_tree (&iter.initial, remap_prebody_decls, &wi, NULL); - walk_tree (&iter.final, remap_prebody_decls, &wi, NULL); - - tree itype, type = TREE_TYPE (iter.index); + tree itype, type = TREE_TYPE (gimple_omp_for_index (inner_loop, i)); if (POINTER_TYPE_P (type)) itype = signed_type_for (type); else itype = type; - enum tree_code cond_code = iter.cond; - tree n1 = iter.initial; - tree n2 = iter.final; + enum tree_code cond_code = gimple_omp_for_cond (inner_loop, i); + tree n1 = unshare_expr (gimple_omp_for_initial (inner_loop, i)); + walk_tree (&n1, remap_prebody_decls, &wi, NULL); + tree n2 = unshare_expr (gimple_omp_for_final (inner_loop, i)); + walk_tree (&n2, remap_prebody_decls, &wi, NULL); adjust_for_condition (loc, &cond_code, &n2); - tree step = get_omp_for_step_from_incr (loc, iter.incr); + tree step; + step = get_omp_for_step_from_incr (loc, +gimple_omp_for_incr (inner_loop, i)); n1 = force_gimple_operand_gsi (gsi, fold_convert (type, n1), true, NULL_TREE, true, GSI_SAME_STMT); n2 = force_gimple_operand_gsi (gsi, fold_convert (itype, n2), true, -- 2.6.0
[hsa] Use gimplify_expr in gridification
Hi, doing some more testing of the branch and combining two of my testcases I came accross a bug where temporaries created by force_gimple_operand_gsi were not added to the proper bind and thus were subsequently re-mapped to error_mark when the target construct was within some other omp construct. Fixed with this patch, where pop_gimplify_context does the right thing like at other places in omp-low.c. Committed to the branch. Thanks, Martin 2015-11-30 Martin Jambor * omp-low.c (attempt_target_gridification): Use gimplify_expr. --- gcc/omp-low.c | 27 +++ 1 file changed, 15 insertions(+), 12 deletions(-) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index bdf6539..7fbdcdf 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -17481,6 +17481,7 @@ attempt_target_gridification (gomp_target *target, gimple_stmt_iterator *gsi, gpukernel); walk_tree (&group_size, remap_prebody_decls, &wi, NULL); + push_gimplify_context (); size_t collapse = gimple_omp_for_collapse (inner_loop); for (size_t i = 0; i < collapse; i++) { @@ -17499,30 +17500,32 @@ attempt_target_gridification (gomp_target *target, gimple_stmt_iterator *gsi, tree step; step = get_omp_for_step_from_incr (loc, gimple_omp_for_incr (inner_loop, i)); - n1 = force_gimple_operand_gsi (gsi, fold_convert (type, n1), true, -NULL_TREE, true, GSI_SAME_STMT); - n2 = force_gimple_operand_gsi (gsi, fold_convert (itype, n2), true, -NULL_TREE, -true, GSI_SAME_STMT); + gimple_seq tmpseq = NULL; + n1 = fold_convert (itype, n1); + n2 = fold_convert (itype, n2); tree t = build_int_cst (itype, (cond_code == LT_EXPR ? -1 : 1)); t = fold_build2 (PLUS_EXPR, itype, step, t); t = fold_build2 (PLUS_EXPR, itype, t, n2); - t = fold_build2 (MINUS_EXPR, itype, t, fold_convert (itype, n1)); + t = fold_build2 (MINUS_EXPR, itype, t, n1); if (TYPE_UNSIGNED (itype) && cond_code == GT_EXPR) t = fold_build2 (TRUNC_DIV_EXPR, itype, fold_build1 (NEGATE_EXPR, itype, t), fold_build1 (NEGATE_EXPR, itype, step)); else t = fold_build2 (TRUNC_DIV_EXPR, itype, t, step); - t = fold_convert (uint32_type_node, t); - tree gs = force_gimple_operand_gsi (gsi, t, true, NULL_TREE, true, - GSI_SAME_STMT); + tree gs = fold_convert (uint32_type_node, t); + gimplify_expr (&gs, &tmpseq, NULL, is_gimple_val, fb_rvalue); + if (!gimple_seq_empty_p (tmpseq)) + gsi_insert_seq_before (gsi, tmpseq, GSI_SAME_STMT); + tree ws; if (i == 0 && group_size) { ws = fold_convert (uint32_type_node, group_size); - ws = force_gimple_operand_gsi (gsi, ws, true, NULL_TREE, true, -GSI_SAME_STMT); + tmpseq = NULL; + gimplify_expr (&ws, &tmpseq, NULL, is_gimple_val, fb_rvalue); + if (!gimple_seq_empty_p (tmpseq)) + gsi_insert_seq_before (gsi, tmpseq, GSI_SAME_STMT); } else ws = build_zero_cst (uint32_type_node); @@ -17534,7 +17537,7 @@ attempt_target_gridification (gomp_target *target, gimple_stmt_iterator *gsi, OMP_CLAUSE_CHAIN (c) = gimple_omp_target_clauses (target); gimple_omp_target_set_clauses (target, c); } - + pop_gimplify_context (tgt_bind); delete declmap; return; } -- 2.6.0
[hsa] Useful checking assert in scan_omp_1_op
Hi, I have found that adding the following checking assert very useful when debugging omp lowering issues, so I have added it to the hsa branch. I hope that nobody will mind, but it of course is not an essential thing to have if someone does. Thanks, Martin 2015-12-03 Martin Jambor * omp-low.c (scan_omp_1_op): Add checking assert that we are not re-mapping to ERROR_MARK. --- gcc/omp-low.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 8854df7..05d8901 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -3731,7 +3731,11 @@ scan_omp_1_op (tree *tp, int *walk_subtrees, void *data) case LABEL_DECL: case RESULT_DECL: if (ctx) - *tp = remap_decl (t, &ctx->cb); + { + tree repl = remap_decl (t, &ctx->cb); + gcc_checking_assert (TREE_CODE (repl) != ERROR_MARK); + *tp = repl; + } break; default: -- 2.6.3
[hsa] Make copy_gimple_seq_and_replace_locals copy seqs in omp clauses
Hi, this is a fix to the last "last" ICE of the hsa branch. THe problem turned out not to be in the gridification itself but, depending your point of view, in the gimple and tree walking infrastructure or in function copy_gimple_seq_and_replace_locals from tree-inline.c on which hsa gridification relies. The issue is that in between gimplification and omplow pass, there can be gimple sequences attached to OMP_CLAUSE trees that are attached to omp statements and that are neither copied by gimple_seq_copy nor walked by walk_gimple_seq. While the correct solution would probably be to extend tree and gimple walkers to handle them, that would be a big change. I have talked with Jakub about this yesterday on the IRC and he suggested that I enhance the internal walkers of copy_gimple_seq_and_replace_locals deal with this situation. Even though that leaves gimple_seq_copy, walk_gimple_seq and other to be technically incorrect, that is what I have done in the patch below, which fixes my last ICEs and which I have already committed to the branch. Any feedback is of course very much appreciated, Martin 2015-12-03 Martin Jambor * tree-inline.c (duplicate_remap_omp_clause_seq): New function. (replace_locals_op): Duplicate gimple sequences in OMP clauses. --- gcc/tree-inline.c | 43 +++ 1 file changed, 43 insertions(+) diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c index ebab189..15141dc 100644 --- a/gcc/tree-inline.c +++ b/gcc/tree-inline.c @@ -5116,6 +5116,8 @@ mark_local_labels_stmt (gimple_stmt_iterator *gsip, return NULL_TREE; } +static gimple_seq duplicate_remap_omp_clause_seq (gimple_seq seq, + struct walk_stmt_info *wi); /* Called via walk_gimple_seq by copy_gimple_seq_and_replace_local. Using the splay_tree pointed to by ST (which is really a `splay_tree'), @@ -5160,6 +5162,35 @@ replace_locals_op (tree *tp, int *walk_subtrees, void *data) TREE_OPERAND (expr, 3) = NULL_TREE; } } + else if (TREE_CODE (expr) == OMP_CLAUSE) +{ + /* Before the omplower pass completes, some OMP clauses can contain +sequences that are neither copied by gimple_seq_copy nor walked by +walk_gimple_seq. To make copy_gimple_seq_and_replace_locals work even +in those situations, we have to copy and process them explicitely. */ + + if (OMP_CLAUSE_CODE (expr) == OMP_CLAUSE_LASTPRIVATE) + { + gimple_seq seq = OMP_CLAUSE_LASTPRIVATE_GIMPLE_SEQ (expr); + seq = duplicate_remap_omp_clause_seq (seq, wi); + OMP_CLAUSE_LASTPRIVATE_GIMPLE_SEQ (expr) = seq; + } + else if (OMP_CLAUSE_CODE (expr) == OMP_CLAUSE_LINEAR) + { + gimple_seq seq = OMP_CLAUSE_LINEAR_GIMPLE_SEQ (expr); + seq = duplicate_remap_omp_clause_seq (seq, wi); + OMP_CLAUSE_LINEAR_GIMPLE_SEQ (expr) = seq; + } + else if (OMP_CLAUSE_CODE (expr) == OMP_CLAUSE_REDUCTION) + { + gimple_seq seq = OMP_CLAUSE_REDUCTION_GIMPLE_INIT (expr); + seq = duplicate_remap_omp_clause_seq (seq, wi); + OMP_CLAUSE_REDUCTION_GIMPLE_INIT (expr) = seq; + seq = OMP_CLAUSE_REDUCTION_GIMPLE_MERGE (expr); + seq = duplicate_remap_omp_clause_seq (seq, wi); + OMP_CLAUSE_REDUCTION_GIMPLE_MERGE (expr) = seq; + } +} /* Keep iterating. */ return NULL_TREE; @@ -5200,6 +5231,18 @@ replace_locals_stmt (gimple_stmt_iterator *gsip, return NULL_TREE; } +/* Create a copy of SEQ and remap all decls in it. */ + +static gimple_seq +duplicate_remap_omp_clause_seq (gimple_seq seq, struct walk_stmt_info *wi) +{ + /* If there are any labels in OMP sequences, they can be only referred to in + the sequence itself and therefore we can do both here. */ + walk_gimple_seq (seq, mark_local_labels_stmt, NULL, wi); + gimple_seq copy = gimple_seq_copy (seq); + walk_gimple_seq (copy, replace_locals_stmt, replace_locals_op, wi); + return copy; +} /* Copies everything in SEQ and replaces variables and labels local to current_function_decl. */ -- 2.6.3
[hsa 0/10] Merge of HSA branch
Hi, I'm sorry it took me more than a month to come up with another round of patches aiming at merging the HSA branch into the trunk. Keeping up-to date with the latest changes in the OpenMP 4.5 area was strenuous and we have discovered and fixed a few bugs as I intensified my testing efforts. While those are the main areas where this patch set differs from the previous one, I have of course addressed the feedback I got the last time, including implementing device-specific OpenMP target arguments, moving kernel grid size from gimple class fields to new artificial clauses and disabling the vectorizer for HSA functions using DECL_FUNCTION_SPECIFIC_OPTIMIZATION rather than extra code in respective pass gates. Because I have not been able to come up with any solution to failing libgomp/testsuite/libgomp.c++/target-2.C, I have disabled use of dynamic parallelism in this merge (I keep it on the branch) and therefore entirely rely on the gridification process to run loops on the accelerator, because gridified constructs do not have this issue (passing private symbols by reference). HSA tests are still missing, I would need some guidance as to how to best implement them (specially to test gridification which of course does not happen for other accelerators). There are no failing testcases if HSA is not configured. If it is, there are some, all of which fall into one the following categories: 1) HSA cannot compile a function for one reason or another (most common cause is inability of HSA to take an address of a function or make an indirect call) and gives a warning, which is regarded as an "excess error" by dejagnu. 2) When HSA is not emitted for a function, libgomp runs a host fallback instead of it. When the test queries omp_is_initial_device and asserts it returns false, the test fails. 3) There are still a few failing OpenACC tests, but those just should not be run. Of course, the patch set bootstraps fine on x86_64-linux with or without configured HSA. Any feedback is welcome. Thanks, Martin
[hsa 1/10] Configury changes and new options
Hi, this patch contains changes to the configuration mechanism and offload bits, so that users can build compilers with HSA support. It plays nicely with other accelerators despite using an altogether different implementation approach. I have also added to it definitions of the new options and parameters, since at least one hunk in common.opt is highly related. -fdisable-hsa-gridification has disappeared, othrwise very little has changed since the last submission. With this patch, the user can request HSA support by including the string "hsa" among the requested accelerators in --enable-offload-targets. This will cause the compiler to start producing HSAIL for target OpenMP constructs/functions and the hsa libgomp plugin to be built. Because the plugin needs to use HSA run-time library, I have introduced options --with-hsa-runtime (and more precise --with-hsa-include and --with-hsa-lib) to help find it. The open-sourced hsa runtime available at github is binary compatible with the closed-source one which however also contains the finalizer and so needs to be used for all practical purposes. I am regularly asking AMD to keep their promise and open source the finalizer too. One catch is however that there is no offload compiler for HSA and so the wrapper should not attempt to look for it (that is what the hunk in lto-wrapper.c does) and when HSA is the only accelerator, it is wasteful to output LTO sections with byte-code and therefore if HSA is the only configured accelerator, it does not set ENABLE_OFFLOADING macro. Finally, when the compiler has been configured for HSA but the user disables it by omitting it in the -foffload compiler option, we need to observe that decision. That is what the opts.c hunk does. As far as the options are concerned, the patch adds new warning -Whsa we emit whenever we fail to produce HSAIL for some source code. It is on by default but warnigs are of course only emitted by HSAIL generating code so will never affect anybody who does not use both an HSA-enabled compiler and OpenMP 4 device constructs. Then there is a new parameter hsa-gen-debug-stores, which will be obsolete once HSA run-time supports debugging traps. Before that, we have to do with debugging stores to memory at defined places, which however can cost speed in benchmarks. So we only enabled them with this parameter. We decided to make it a parameter rather than a switch to emphasize the fact it will go away and to possibly allow us select different levels of verbosity of the stores in the future). Any feedback is very appreciated, Martin 2015-12-04 Martin Jambor gcc/ * Makefile.in (OBJS): Add new source files. (GTFILES): Add hsa.c. * config.in (ENABLE_HSA): New. * configure.ac: Treat hsa differently from other accelerators. (OFFLOAD_TARGETS): Define ENABLE_OFFLOADING according to $enable_offloading. (ENABLE_HSA): Define ENABLE_HSA according to $enable_hsa. * doc/install.texi (Configuration): Document --with-hsa-runtime, --with-hsa-runtime-include and --with-hsa-runtime-lib. * lto-wrapper.c (compile_images_for_offload_targets): Do not attempt to invoke offload compiler for hsa acclerator. * opts.c (common_handle_option): Determine whether HSA offloading should be performed. * common.opt (disable_hsa): New variable. (-Whsa): New warning. * doc/invoke.texi (-Whsa): Document. (hsa-gen-debug-stores): Likewise. * params.def (PARAM_HSA_GEN_DEBUG_STORES): New parameter. libgomp/plugin/ * Makefrag.am: Add HSA plugin requirements. * configfrag.ac (HSA_RUNTIME_INCLUDE): New variable. (HSA_RUNTIME_LIB): Likewise. (HSA_RUNTIME_CPPFLAGS): Likewise. (HSA_RUNTIME_INCLUDE): New substitution. (HSA_RUNTIME_LIB): Likewise. (HSA_RUNTIME_LDFLAGS): Likewise. (hsa-runtime): New configure option. (hsa-runtime-include): Likewise. (hsa-runtime-lib): Likewise. (PLUGIN_HSA): New substitution variable. Fill HSA_RUNTIME_INCLUDE and HSA_RUNTIME_LIB according to the new configure options. (PLUGIN_HSA_CPPFLAGS): Likewise. (PLUGIN_HSA_LDFLAGS): Likewise. (PLUGIN_HSA_LIBS): Likewise. Check that we have access to HSA run-time. diff --git a/gcc/Makefile.in b/gcc/Makefile.in index bee2879..5fe73a7 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1296,6 +1296,11 @@ OBJS = \ graphite-sese-to-poly.o \ gtype-desc.o \ haifa-sched.o \ + hsa.o \ + hsa-gen.o \ + hsa-regalloc.o \ + hsa-brig.o \ + hsa-dump.o \ hw-doloop.o \ hwint.o \ ifcvt.o \ @@ -1320,6 +1325,7 @@ OBJS = \ ipa-icf.o \ ipa-icf-gimple.o \ ipa-reference.o \ + ipa-hsa.o \ ipa-ref.o \ ipa-utils.o \ ipa.o \ @@ -2401,6 +2407,7 @@ GTFILES = $(CPP
[hsa 2/10] Modifications to libgomp proper
Hi, The patch below contains all changes to libgomp files except for the hsa plugin (which is in the following patch). The changes can roughly divided into three categories. First, it contains changes I that are necessary to support shared-memory devices. In majority of cases this means treating them like the host fallback because there is no need to copy, host malloc can be used for allocating etc. It also means that GOMP_target_ext and gomp_target_task_fn should not be remapping arguments but should pass to the plugin the same thing host fallback function would receive. Second, because GCC HSA backend often does not emit HSAIL for function it knows it cannot handle, these two functions need to gracefully handle the case when there is no device implementation of a particular function available by doing host fallback too. Third, the patch implements libgomp-part of the device-specific arguments passed to GOMP_target as requested Jakub (well, some are actually for all devices but that is what we call them). Because of nowait target constructs, the arguments have proliferated into tasking too, as did firstprivate copies. Any feedback will be greatly appreciated, Martin 2015-12-04 Martin Jambor Martin Liska include/ * gomp-constants.h (GOMP_DEVICE_HSA): New macro. (GOMP_VERSION_HSA): Likewise. (GOMP_TARGET_ARG_DEVICE_MASK): Likewise. (GOMP_TARGET_ARG_DEVICE_ALL): Likewise. (GOMP_TARGET_ARG_SUBSEQUENT_PARAM): Likewise. (GOMP_TARGET_ARG_ID_MASK): Likewise. (GOMP_TARGET_ARG_NUM_TEAMS): Likewise. (GOMP_TARGET_ARG_THREAD_LIMIT): Likewise. (GOMP_TARGET_ARG_VALUE_SHIFT): Likewise. (GOMP_TARGET_ARG_HSA_KERNEL_ATTRIBUTES): Likewise. (GOMP_kernel_launch_attributes): New type. (GOMP_hsa_kernel_dispatch): New type. libgomp/ * libgomp-plugin.h (offload_target_type): New element OFFLOAD_TARGET_TYPE_HSA. * libgomp.h (gomp_target_task): New field args. (bool gomp_create_target_task): Updated. (gomp_device_descr): Extra parameter of run_func and async_run_func, new field can_run_func. * libgomp_g.h (GOMP_target_ext): Change prototype. * oacc-host.c (host_run): Added a new parameter args. * target.c (gomp_target_fallback_firstprivate): New function. (gomp_target_fallback_firstprivate): Use gomp_target_fallback_firstprivate. (gomp_get_target_fn_addr): Allow returning NULL for shared memory devices. (GOMP_target): Do host fallback for all shared memory devices. Do not pass any args to plugins. (GOMP_target_ext): Add new parameter args. Allow host fallback if device shares memory. Do not remap data if device has shared memory. (gomp_target_task_fn): Likewise. Also Treat shared memory devices like host fallback for mappings. (GOMP_target_data): Treat shared memory devices like host fallback. (GOMP_target_data_ext): Likewise. (GOMP_target_update): Likewise. (GOMP_target_update_ext): Likewise. Also pass NULL as args to gomp_create_target_task. (GOMP_target_enter_exit_data): Likewise. (omp_target_alloc): Treat shared memory devices like host fallback. (omp_target_free): Likewise. (omp_target_is_present): Likewise. (omp_target_memcpy): Likewise. (omp_target_memcpy_rect): Likewise. (omp_target_associate_ptr): Likewise. (gomp_load_plugin_for_device): Also load can_run. * task.c (GOMP_PLUGIN_target_task_completion): Free firstprivate_copies. (gomp_create_target_task): Accept new argument args and store it to ttask. liboffloadmic/plugin * libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_async_run): New unused parameter. (GOMP_OFFLOAD_run): Likewise. diff --git a/include/gomp-constants.h b/include/gomp-constants.h index dffd631..1dae474 100644 --- a/include/gomp-constants.h +++ b/include/gomp-constants.h @@ -176,6 +176,7 @@ enum gomp_map_kind #define GOMP_DEVICE_NOT_HOST 4 #define GOMP_DEVICE_NVIDIA_PTX 5 #define GOMP_DEVICE_INTEL_MIC 6 +#define GOMP_DEVICE_HSA7 #define GOMP_DEVICE_ICV-1 #define GOMP_DEVICE_HOST_FALLBACK -2 @@ -201,6 +202,7 @@ enum gomp_map_kind #define GOMP_VERSION 0 #define GOMP_VERSION_NVIDIA_PTX 1 #define GOMP_VERSION_INTEL_MIC 0 +#define GOMP_VERSION_HSA 0 #define GOMP_VERSION_PACK(LIB, DEV) (((LIB) << 16) | (DEV)) #define GOMP_VERSION_LIB(PACK) (((PACK) >> 16) & 0x) @@ -228,4 +230,74 @@ enum gomp_map_kind #define GOMP_LAUNCH_OP(X) (((X) >> GOMP_LAUNCH_OP_SHIFT) & 0x) #define GOMP_LAUNCH_OP_MAX 0x +/* Bitmask to apply in order to find out the intended device of a target + argument. */ +#define GOMP_TARGET_ARG_DEVICE_MASK((1 << 7)
[hsa 3/10] HSA libgomp plugin
Hi, the patch below adds the HSA-specific plugin for libgomp. The plugin implements the interface mandated by libgomp and takes care of finding any available HSA devices, finalizing HSAIL code and running it on HSA-capable GPUs. The plugin does not really implement any data movement functions (it implements them with a fatal error call) because memory is shared in HSA environments and the previous patch has modified libgomp proper not to call those functions on devices with this capability. The changes since the last submission include version checks, receiving grid sizes through a device-specific parameter and support for asynchronous execution. Any feedback will be greatly appreciated, Martin 2015-12-04 Martin Jambor Martin Liska * plugin/plugin-hsa.c: New file. diff --git a/libgomp/plugin/plugin-hsa.c b/libgomp/plugin/plugin-hsa.c new file mode 100644 index 000..b132954 --- /dev/null +++ b/libgomp/plugin/plugin-hsa.c @@ -0,0 +1,1449 @@ +/* Plugin for HSAIL execution. + + Copyright (C) 2013-2015 Free Software Foundation, Inc. + + Contributed by Martin Jambor and + Martin Liska . + + This file is part of the GNU Offloading and Multi Processing Library + (libgomp). + + Libgomp is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY + WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS + FOR A PARTICULAR PURPOSE. See the GNU General Public License for + more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + <http://www.gnu.org/licenses/>. */ + +#include +#include +#include +#include +#include "libgomp-plugin.h" +#include "gomp-constants.h" +#include "hsa.h" +#include "hsa_ext_finalize.h" +#include "dlfcn.h" + +/* Part of the libgomp plugin interface. Return the name of the accelerator, + which is "hsa". */ + +const char * +GOMP_OFFLOAD_get_name (void) +{ + return "hsa"; +} + +/* Part of the libgomp plugin interface. Return the specific capabilities the + HSA accelerator have. */ + +unsigned int +GOMP_OFFLOAD_get_caps (void) +{ + return GOMP_OFFLOAD_CAP_SHARED_MEM | GOMP_OFFLOAD_CAP_OPENMP_400; +} + +/* Part of the libgomp plugin interface. Identify as HSA accelerator. */ + +int +GOMP_OFFLOAD_get_type (void) +{ + return OFFLOAD_TARGET_TYPE_HSA; +} + +/* Return the libgomp version number we're compatible with. There is + no requirement for cross-version compatibility. */ + +unsigned +GOMP_OFFLOAD_version (void) +{ + return GOMP_VERSION; +} + +/* Flag to decide whether print to stderr information about what is going on. + Set in init_debug depending on environment variables. */ + +static bool debug; + +/* Flag to decide if the runtime should suppress a possible fallback to host + execution. */ + +static bool suppress_host_fallback; + +/* Initialize debug and suppress_host_fallback according to the environment. */ + +static void +init_enviroment_variables (void) +{ + if (getenv ("HSA_DEBUG")) +debug = true; + else +debug = false; + + if (getenv ("HSA_SUPPRESS_HOST_FALLBACK")) +suppress_host_fallback = true; + else +suppress_host_fallback = false; +} + +/* Print a logging message with PREFIX to stderr if HSA_DEBUG value + is set to true. */ + +#define HSA_LOG(prefix, ...) \ + do \ + { \ +if (debug) \ + { \ + fprintf (stderr, prefix); \ + fprintf (stderr, __VA_ARGS__); \ + } \ + } \ + while (false); + +/* Print a debugging message to stderr. */ + +#define HSA_DEBUG(...) HSA_LOG ("HSA debug: ", __VA_ARGS__) + +/* Print a warning message to stderr. */ + +#define HSA_WARNING(...) HSA_LOG ("HSA warning: ", __VA_ARGS__) + +/* Print HSA warning STR with an HSA STATUS code. */ + +static void +hsa_warn (const char *str, hsa_status_t status) +{ + if (!debug) +return; + + const char* hsa_error; + hsa_status_string (status, &hsa_error); + + unsigned l = strlen (hsa_error); + + char *err = GOMP_PLUGIN_malloc (sizeof (char) * l); + memcpy (err, hsa_error, l - 1); + err[l] = '\0'; + + fprintf (stderr, "HSA warning: %s (%s)\n", str, err); + + free (err); +} + +/* Report a fatal error STR together with the HSA error corresponding to STATUS + and terminate execution of the current pr
[hsa 4/10] Merge of HSA branch
Subject: Make copy_gimple_seq_and_replace_locals copy seqs in omp clauses Hi, this is https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00477.html with the early return requested by Jakub. Please refer to that previous email for explanation why it is necessary. Thanks, 2015-12-03 Martin Jambor * tree-inline.c (duplicate_remap_omp_clause_seq): New function. (replace_locals_op): Duplicate gimple sequences in OMP clauses. diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c index ebab189..dea23c7 100644 --- a/gcc/tree-inline.c +++ b/gcc/tree-inline.c @@ -5116,6 +5116,8 @@ mark_local_labels_stmt (gimple_stmt_iterator *gsip, return NULL_TREE; } +static gimple_seq duplicate_remap_omp_clause_seq (gimple_seq seq, + struct walk_stmt_info *wi); /* Called via walk_gimple_seq by copy_gimple_seq_and_replace_local. Using the splay_tree pointed to by ST (which is really a `splay_tree'), @@ -5160,6 +5162,35 @@ replace_locals_op (tree *tp, int *walk_subtrees, void *data) TREE_OPERAND (expr, 3) = NULL_TREE; } } + else if (TREE_CODE (expr) == OMP_CLAUSE) +{ + /* Before the omplower pass completes, some OMP clauses can contain +sequences that are neither copied by gimple_seq_copy nor walked by +walk_gimple_seq. To make copy_gimple_seq_and_replace_locals work even +in those situations, we have to copy and process them explicitely. */ + + if (OMP_CLAUSE_CODE (expr) == OMP_CLAUSE_LASTPRIVATE) + { + gimple_seq seq = OMP_CLAUSE_LASTPRIVATE_GIMPLE_SEQ (expr); + seq = duplicate_remap_omp_clause_seq (seq, wi); + OMP_CLAUSE_LASTPRIVATE_GIMPLE_SEQ (expr) = seq; + } + else if (OMP_CLAUSE_CODE (expr) == OMP_CLAUSE_LINEAR) + { + gimple_seq seq = OMP_CLAUSE_LINEAR_GIMPLE_SEQ (expr); + seq = duplicate_remap_omp_clause_seq (seq, wi); + OMP_CLAUSE_LINEAR_GIMPLE_SEQ (expr) = seq; + } + else if (OMP_CLAUSE_CODE (expr) == OMP_CLAUSE_REDUCTION) + { + gimple_seq seq = OMP_CLAUSE_REDUCTION_GIMPLE_INIT (expr); + seq = duplicate_remap_omp_clause_seq (seq, wi); + OMP_CLAUSE_REDUCTION_GIMPLE_INIT (expr) = seq; + seq = OMP_CLAUSE_REDUCTION_GIMPLE_MERGE (expr); + seq = duplicate_remap_omp_clause_seq (seq, wi); + OMP_CLAUSE_REDUCTION_GIMPLE_MERGE (expr) = seq; + } +} /* Keep iterating. */ return NULL_TREE; @@ -5200,6 +5231,21 @@ replace_locals_stmt (gimple_stmt_iterator *gsip, return NULL_TREE; } +/* Create a copy of SEQ and remap all decls in it. */ + +static gimple_seq +duplicate_remap_omp_clause_seq (gimple_seq seq, struct walk_stmt_info *wi) +{ + if (!seq) +return NULL; + + /* If there are any labels in OMP sequences, they can be only referred to in + the sequence itself and therefore we can do both here. */ + walk_gimple_seq (seq, mark_local_labels_stmt, NULL, wi); + gimple_seq copy = gimple_seq_copy (seq); + walk_gimple_seq (copy, replace_locals_stmt, replace_locals_op, wi); + return copy; +} /* Copies everything in SEQ and replaces variables and labels local to current_function_decl. */
[hsa 5/10] OpenMP lowering/expansion changes (gridification)
Hi, the patch in this email contains the changes to make our OpenMP lowering and expansion machinery produce GPU kernels for a certain limited class of loops. The plan is to make that class quite a big bigger, but only the following is ready for submission now. Basically, whenever the compiler configured for HSAIL generation encounters the following pattern: #pragma omp target #pragma omp teams thread_limit(workgroup_size) // thread_limit is optional #pragma omp distribute parallel for firstprivate(n,j) private(i) other_sharing_clauses() for (i = j + 1; i < n; i += 3) some_loop_body it creates a copy of the entire target body and expands it slightly differently for concurrent execution on a GPU. Note that both teams and distribute constructs are mandatory. Moreover, currently the distribute has to be in a combined statement with the inner for construct. And there are quite a few other restrictions which I hope to alleviate over the next year, most notably reductions and collapse clause now prevent gridification (see the new function target_follows_gridifiable_pattern to find out what exactly the restrictions are). The first phase of the "gridification" process is run before omp "scanning" phase. We look for the pattern above, and if we encounter one, we copy its entire body into a new gimple statement GIMPLE_OMP_GPUKERNEL. Within it, we mark the teams, distribute and parallel constructs with a new flag "kernel_phony." This flag will then make OMP lowering phase process their sharing clauses like usual, but the statements representing the constructs will be removed at lowering (and thus will never be expanded). The resulting wasteful repackaging of data is nicely cleaned by our optimizers even at -O1. At expansion time, we identify gomp_target statements with a kernel and expand the kernel into a special function, with the loop represented by the GPU grid and not control flow. Afterwards, the normal body of the target is expanded as usual. Finally, we need to take the grid dimensions stored within new fields of the target statement by the first phase, store in a structure and pass them in a device-specific argument to GOMP_target_ext. The patch thus also implements the compiler part of device-specific target arguments as discussed on the mailing list an IRC. Originally, when I started with the above pattern matching, I did not allow any other gimple statements in between the respective omp constructs. That however proved to be too restrictive for two reasons. First, statements in pre-bodies of both distribute and for loops needed to be accounted for when calculating the kernel grid size (which is done before the target statement itself) and second, Fortran parameter dereferences happily result in interleaving statements when there were none in the user source code. Therefore, I allow register-type stores to local non-addressable variables in pre-bodies and also in between the OMP constructs. All of them are copied in front of the target statement and either used for grid size calculation or removed as useless by later optimizations. I hope that eventually I managed to write the gridification in a way that interferes very little with the rest of the OMP pipeline and yet only re-implement the bare necessary minimum of functionality that is already there. Any feedback is of course still very welcome. Thanks, Martin 2015-12-04 Martin Jambor * builtin-types.def (BT_FN_VOID_UINT_PTR_INT_PTR): New. (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT): Removed. (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR): New. * fortran/types.def (BT_FN_VOID_UINT_PTR_INT_PTR): New. (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT): Removed. (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR): New. * gimple-low.c (lower_stmt): Also handle GIMPLE_OMP_GPUKERNEL. * gimple-pretty-print.c (dump_gimple_omp_for): Also handle GF_OMP_FOR_KIND_KERNEL_BODY. (dump_gimple_omp_block): Also handle GIMPLE_OMP_GPUKERNEL. (pp_gimple_stmt_1): Likewise. * gimple-walk.c (walk_gimple_stmt): Likewise. * gimple.c (gimple_build_omp_gpukernel): New function. (gimple_copy): Also handle GIMPLE_OMP_GPUKERNEL. * gimple.def (GIMPLE_OMP_TEAMS): Moved into its own layout. (GIMPLE_OMP_GPUKERNEL): New. * gimple.h (gf_mask): Added GF_OMP_FOR_KIND_KERNEL_BODY. (gomp_for): New field kernel_phony. (gimple_statement_omp_parallel_layout): Likewise. (gimple_statement_omp_single_layout): Updated comments. (gomp_teams): New field kernel_phony. (gimple_build_omp_gpukernel): Declare. (gimple_has_substatements): Also handle GIMPLE_OMP_GPUKERNEL. (gimple_omp_for_kernel_phony): New. (gimple_omp_for_set_kernel_phony): Likewise. (gimple_omp
[hsa 6/10] Pass manager changes
Hi, the pass manager changes required for HSA have already been committed to trunk so all that remains are these additions to the pass pipeline. Thanks, Martin 2015-12-04 Martin Jambor Martin Liska * passes.def: Schedule pass_ipa_hsa and pass_gen_hsail. * tree-pass.h (make_pass_gen_hsail): Declare. (make_pass_ipa_hsa): Likewise. diff --git a/gcc/passes.def b/gcc/passes.def index 28cb4c1..0f0f36d 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -144,6 +144,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_ipa_cp); NEXT_PASS (pass_ipa_cdtor_merge); NEXT_PASS (pass_target_clone); + NEXT_PASS (pass_ipa_hsa); NEXT_PASS (pass_ipa_inline); NEXT_PASS (pass_ipa_pure_const); NEXT_PASS (pass_ipa_reference); @@ -377,6 +378,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_nrv); NEXT_PASS (pass_cleanup_cfg_post_optimizing); NEXT_PASS (pass_warn_function_noreturn); + NEXT_PASS (pass_gen_hsail); NEXT_PASS (pass_expand); diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index 9704918..30127d4 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -467,6 +467,7 @@ extern gimple_opt_pass *make_pass_ubsan (gcc::context *ctxt); extern gimple_opt_pass *make_pass_sanopt (gcc::context *ctxt); extern gimple_opt_pass *make_pass_oacc_kernels (gcc::context *ctxt); extern gimple_opt_pass *make_pass_oacc_kernels2 (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_gen_hsail (gcc::context *ctxt); /* IPA Passes */ extern simple_ipa_opt_pass *make_pass_ipa_lower_emutls (gcc::context *ctxt); @@ -491,6 +492,7 @@ extern ipa_opt_pass_d *make_pass_ipa_cp (gcc::context *ctxt); extern ipa_opt_pass_d *make_pass_ipa_icf (gcc::context *ctxt); extern ipa_opt_pass_d *make_pass_ipa_devirt (gcc::context *ctxt); extern ipa_opt_pass_d *make_pass_ipa_reference (gcc::context *ctxt); +extern ipa_opt_pass_d *make_pass_ipa_hsa (gcc::context *ctxt); extern ipa_opt_pass_d *make_pass_ipa_pure_const (gcc::context *ctxt); extern simple_ipa_opt_pass *make_pass_ipa_pta (gcc::context *ctxt); extern simple_ipa_opt_pass *make_pass_ipa_tm (gcc::context *ctxt);
[hsa 7/10] IPA-HSA pass
Hi, when a target construct is gridified, the HSA GPU function is associated with the CPU function throughout the compilation, so that they can be registered as a pair in libgomp. Ungridified target constructs and, more importantly, "pragma omp declare target" marked functions emerge out of OMP expansion as one gimple function for both the host and the accelerator. However, at some point we need to create a special HSA function representation so that we can modify behavior of a (very) few optimization passes for them. Both is done by the following new IPA pass, which creates new HSA clones in these cases. Moreover, it redirects the appropriate call graph edges to be in between HSA implementations, marks HSA clones with the flatten attribute to minimize any call overhead (which is much more significant on GPUs) and makes sure both the CPU and GPU functions are coupled together and remain in the same LTO partition so that they can b registered together to libgomp. Thanks, Martin 2015-12-04 Martin Liska Martin Jambor * ipa-hsa.c: New file. * lto-section-in.c (lto_section_name): Add hsa section name. * lto-streamer.h (lto_section_type): Add hsa section. * lto-partition.c: Include "hsa.h" (add_symbol_to_partition_1): Put hsa implementations int the same partition as host implementations. * timevar.def (TV_IPA_HSA): New. diff --git a/gcc/ipa-hsa.c b/gcc/ipa-hsa.c new file mode 100644 index 000..5b3e563 --- /dev/null +++ b/gcc/ipa-hsa.c @@ -0,0 +1,329 @@ +/* Callgraph based analysis of static variables. + Copyright (C) 2015 Free Software Foundation, Inc. + Contributed by Martin Liska + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +<http://www.gnu.org/licenses/>. */ + +/* Interprocedural HSA pass is responsible for creation of HSA clones. + For all these HSA clones, we emit HSAIL instructions and pass processing + is terminated. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "tm.h" +#include "is-a.h" +#include "hash-set.h" +#include "vec.h" +#include "tree.h" +#include "tree-pass.h" +#include "function.h" +#include "basic-block.h" +#include "gimple.h" +#include "dumpfile.h" +#include "gimple-pretty-print.h" +#include "tree-streamer.h" +#include "stringpool.h" +#include "cgraph.h" +#include "print-tree.h" +#include "symbol-summary.h" +#include "hsa.h" + +namespace { + +/* If NODE is not versionable, warn about not emiting HSAIL and return false. + Otherwise return true. */ + +static bool +check_warn_node_versionable (cgraph_node *node) +{ + if (!node->local.versionable) +{ + warning_at (EXPR_LOCATION (node->decl), OPT_Whsa, + "could not emit HSAIL for function %s: function cannot be " + "cloned", node->name ()); + return false; +} + return true; +} + +/* The function creates HSA clones for all functions that were either + marked as HSA kernels or are callable HSA functions. Apart from that, + we redirect all edges that come from an HSA clone and end in another + HSA clone to connect these two functions. */ + +static unsigned int +process_hsa_functions (void) +{ + struct cgraph_node *node; + + if (hsa_summaries == NULL) +hsa_summaries = new hsa_summary_t (symtab); + + FOR_EACH_DEFINED_FUNCTION (node) +{ + hsa_function_summary *s = hsa_summaries->get (node); + + /* A linked function is skipped. */ + if (s->m_binded_function != NULL) + continue; + + if (s->m_kind != HSA_NONE) + { + if (!check_warn_node_versionable (node)) + continue; + cgraph_node *clone = node->create_virtual_clone + (vec (), NULL, NULL, "hsa"); + TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl); + + clone->force_output = true; + hsa_summaries->link_functions (clone, node, s->m_kind, false); + + if (dump_file) + fprintf (dump_file, "Created a new HSA clone: %s, type: %s\n", +clone->name (), +s->m_kind == HSA_KERNEL ? "kernel" : &qu
[hsa 8/10] HSAIL BRIG description header file (and a steering committee request)
Hi, the following patch adds a BRIG (binary representation of HSAIL) representation description. It is within a single header file describing the binary structures and constants of the format. The file comes from the HSA Foundation (I have only added the HSA_BRIG_FORMAT_H macro and check and removed some weird comments which are not present in proposed future versions of the file) and is licensed under "University of Illinois/NCSA Open Source License." The license is "GPL-compatible" according to FSF (http://www.gnu.org/licenses/license-list.en.html#GPLCompatibleLicenses) so I believe we can have it in GCC. Nevertheless, it is not GPL and there is no copyright assignment for it, but the situation is hopefully analogous to some other libraries that have their upstream elsewhere but we ship them as part of the GCC. I would therefore like to ask the GCC steering committee for permission to add this file to GCC (and update it as HSA standard evolves). Please let me know if there is something more I need to do in this regard. Thanks, Martin 2015-12-04 Martin Jambor * hsa-brig-format.h: New file. diff --git a/gcc/hsa-brig-format.h b/gcc/hsa-brig-format.h new file mode 100644 index 000..6e2fe75 --- /dev/null +++ b/gcc/hsa-brig-format.h @@ -0,0 +1,1277 @@ +// University of Illinois/NCSA +// Open Source License +// +// Copyright (c) 2013-2015, Advanced Micro Devices, Inc. +// All rights reserved. +// +// Developed by: +// +// HSA Team +// +// Advanced Micro Devices, Inc +// +// www.amd.com +// +// Permission is hereby granted, free of charge, to any person obtaining a copy of +// this software and associated documentation files (the "Software"), to deal with +// the Software without restriction, including without limitation the rights to +// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies +// of the Software, and to permit persons to whom the Software is furnished to do +// so, subject to the following conditions: +// +// * Redistributions of source code must retain the above copyright notice, +// this list of conditions and the following disclaimers. +// +// * Redistributions in binary form must reproduce the above copyright notice, +// this list of conditions and the following disclaimers in the +// documentation and/or other materials provided with the distribution. +// +// * Neither the names of the HSA Team, University of Illinois at +// Urbana-Champaign, nor the names of its contributors may be used to +// endorse or promote products derived from this Software without specific +// prior written permission. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS +// FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH THE +// SOFTWARE. + +#ifndef HSA_BRIG_FORMAT_H +#define HSA_BRIG_FORMAT_H + +typedef uint32_t BrigVersion32_t; + +enum BrigVersion { + +BRIG_VERSION_HSAIL_MAJOR = 1, +BRIG_VERSION_HSAIL_MINOR = 0, +BRIG_VERSION_BRIG_MAJOR = 1, +BRIG_VERSION_BRIG_MINOR = 0 +}; + +typedef uint8_t BrigAlignment8_t; + +typedef uint8_t BrigAllocation8_t; + +typedef uint8_t BrigAluModifier8_t; + +typedef uint8_t BrigAtomicOperation8_t; + +typedef uint32_t BrigCodeOffset32_t; + +typedef uint8_t BrigCompareOperation8_t; + +typedef uint16_t BrigControlDirective16_t; + +typedef uint32_t BrigDataOffset32_t; + +typedef BrigDataOffset32_t BrigDataOffsetCodeList32_t; + +typedef BrigDataOffset32_t BrigDataOffsetOperandList32_t; + +typedef BrigDataOffset32_t BrigDataOffsetString32_t; + +typedef uint8_t BrigExecutableModifier8_t; + +typedef uint8_t BrigImageChannelOrder8_t; + +typedef uint8_t BrigImageChannelType8_t; + +typedef uint8_t BrigImageGeometry8_t; + +typedef uint8_t BrigImageQuery8_t; + +typedef uint16_t BrigKind16_t; + +typedef uint8_t BrigLinkage8_t; + +typedef uint8_t BrigMachineModel8_t; + +typedef uint8_t BrigMemoryModifier8_t; + +typedef uint8_t BrigMemoryOrder8_t; + +typedef uint8_t BrigMemoryScope8_t; + +typedef uint16_t BrigOpcode16_t; + +typedef uint32_t BrigOperandOffset32_t; + +typedef uint8_t BrigPack8_t; + +typedef uint8_t BrigProfile8_t; + +typedef uint16_t BrigRegisterKind16_t; + +typedef uint8_t BrigRound8_t; + +typedef uint8_t BrigSamplerAddressing8_t; + +typedef uint8_t BrigSamplerCoordNormalization8_t; + +typedef uint8_t BrigSamplerFilter8_t; + +typedef uint8_t BrigSamplerQuery8_t; + +typedef uint32_t BrigSectionIndex32_t; + +typedef uint8_t BrigSegCvtModifier8_t; + +typedef uint8_t BrigSegment8_t; + +typedef uint32_t BrigStringOffset32_t; + +typedef u
[hsa 10/10] HSA register allocator
Hi, because HSA backend is not based on RTL,we need our own, and it is in this patch. The allocator has been written by Michael Matz and I have put it into a separate email so that I can add him to CC, because he is much better suited to answer any questions or review comments. Thanks, Martin 2015-12-04 Michael Matz Martin Jambor * hsa-regalloc.c: New file. diff --git a/gcc/hsa-regalloc.c b/gcc/hsa-regalloc.c new file mode 100644 index 000..9db4c1d --- /dev/null +++ b/gcc/hsa-regalloc.c @@ -0,0 +1,719 @@ +/* HSAIL IL Register allocation and out-of-SSA. + Copyright (C) 2013-15 Free Software Foundation, Inc. + Contributed by Michael Matz + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +<http://www.gnu.org/licenses/>. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "tm.h" +#include "is-a.h" +#include "vec.h" +#include "tree.h" +#include "dominance.h" +#include "cfg.h" +#include "cfganal.h" +#include "function.h" +#include "bitmap.h" +#include "dumpfile.h" +#include "cgraph.h" +#include "print-tree.h" +#include "cfghooks.h" +#include "symbol-summary.h" +#include "hsa.h" + + +/* Process a PHI node PHI of basic block BB as a part of naive out-f-ssa. */ + +static void +naive_process_phi (hsa_insn_phi *phi) +{ + unsigned count = phi->operand_count (); + for (unsigned i = 0; i < count; i++) +{ + gcc_checking_assert (phi->get_op (i)); + hsa_op_base *op = phi->get_op (i); + hsa_bb *hbb; + edge e; + + if (!op) + break; + + e = EDGE_PRED (phi->m_bb, i); + if (single_succ_p (e->src)) + hbb = hsa_bb_for_bb (e->src); + else + { + basic_block old_dest = e->dest; + hbb = hsa_init_new_bb (split_edge (e)); + + /* If switch insn used this edge, fix jump table. */ + hsa_bb *source = hsa_bb_for_bb (e->src); + hsa_insn_sbr *sbr; + if (source->m_last_insn + && (sbr = dyn_cast (source->m_last_insn))) + sbr->replace_all_labels (old_dest, hbb->m_bb); + } + + hsa_build_append_simple_mov (phi->m_dest, op, hbb); +} +} + +/* Naive out-of SSA. */ + +static void +naive_outof_ssa (void) +{ + basic_block bb; + + hsa_cfun->m_in_ssa = false; + + FOR_ALL_BB_FN (bb, cfun) + { +hsa_bb *hbb = hsa_bb_for_bb (bb); +hsa_insn_phi *phi; + +for (phi = hbb->m_first_phi; +phi; +phi = phi->m_next ? as_a (phi->m_next): NULL) + naive_process_phi (phi); + +/* Zap PHI nodes, they will be deallocated when everything else will. */ +hbb->m_first_phi = NULL; +hbb->m_last_phi = NULL; + } +} + +/* Return register class number for the given HSA TYPE. 0 means the 'c' one + bit register class, 1 means 's' 32 bit class, 2 stands for 'd' 64 bit class + and 3 for 'q' 128 bit class. */ + +static int +m_reg_class_for_type (BrigType16_t type) +{ + switch (type) +{ +case BRIG_TYPE_B1: + return 0; + +case BRIG_TYPE_U8: +case BRIG_TYPE_U16: +case BRIG_TYPE_U32: +case BRIG_TYPE_S8: +case BRIG_TYPE_S16: +case BRIG_TYPE_S32: +case BRIG_TYPE_F16: +case BRIG_TYPE_F32: +case BRIG_TYPE_B8: +case BRIG_TYPE_B16: +case BRIG_TYPE_B32: +case BRIG_TYPE_U8X4: +case BRIG_TYPE_S8X4: +case BRIG_TYPE_U16X2: +case BRIG_TYPE_S16X2: +case BRIG_TYPE_F16X2: + return 1; + +case BRIG_TYPE_U64: +case BRIG_TYPE_S64: +case BRIG_TYPE_F64: +case BRIG_TYPE_B64: +case BRIG_TYPE_U8X8: +case BRIG_TYPE_S8X8: +case BRIG_TYPE_U16X4: +case BRIG_TYPE_S16X4: +case BRIG_TYPE_F16X4: +case BRIG_TYPE_U32X2: +case BRIG_TYPE_S32X2: +case BRIG_TYPE_F32X2: + return 2; + +case BRIG_TYPE_B128: +case BRIG_TYPE_U8X16: +case BRIG_TYPE_S8X16: +case BRIG_TYPE_U16X8: +case BRIG_TYPE_S16X8: +case BRIG_TYPE_F16X8: +case BRIG_TYPE_U32X4: +case BRIG_TYPE_U64X2: +case BRIG_TYPE_S32X4: +case BRIG_TYPE_S64X2: +case BRIG_TYPE_F32X4: +case BRIG_TYPE_F64X2: + return 3; + +default: + gcc_unreachable (); +} +} + +/* If the Ith
Re: ipa-cp heuristics fixes
Hi, thanks for looking into this, I only have one question: On Thu, Dec 10, 2015 at 08:30:37AM +0100, Jan Hubicka wrote: > Martin, > while looking into the ipa-cp dumps for bzip and Firefox I noticed few issues. > First of all, ipcp_cloning_candidate_p calls > optimize_function_for_speed_p (DECL_STRUCT_FUNCTION (node->decl)) > which can not be used at WPA time, becuase we have no DECL_STRUCT_FUNCTION > around. I replaced it by node->optimize_for_size_p (). > > Second we perform incredible number of clones because we do obtain some sort > of > polymorphic call context for them. In wast majority of cases this is useless > effort, because the functions in question do not contain virtual calls and do > not pass the parameter further. For firefox about 40k out of 50k clones > created are created just because we found some context. > > I changed the code to only clone if this immediately leads to > devirtualization. > This do not cause any noticeable drop in number of devirtualized calls on > Firefox. I suppose we will miss the case where cloning a caller may allow > devirtualization in a clone of callee, but I do not think the heuristics for > context independent values can handle this as implemented right now and it > simply have way to many false positives. > > What we can do is to devirtualize w/o cloning for local functions and > speculatively devirtualize in case we would otherwise clone. > > Third problem I noticed is that > will_be_removed_from_program_if_no_direct_calls_p is used to decide if we can > ignore the function size when deciding about the code size impact. > This function is doing some analysis for inliner where it, for example, > analyses > if a comdat which is going to be inlined consistently in the whole program > will be removed. > > In the cloning case I do not see this to apply: we have no evidence that the > other units will pass the same constants to the function. I think you > basically want to assume that the function will be removed if it has no > address taken and it is not externally visibible. This is what local flag > is for. > > I gathered some stats: > > number of clones for all contexts: 49948->11102 > number of clones: 4376->4383 > > good_cloning_opportunity_p is called about 70k times, I wonder if the > thresholds are not simply set too high. For example, inliner does about 300k > inlines at Firefox. > > number of param replacements: 13041-> 13056 + 5383 aggregate replacements (I > do not have data on unpatched tree for this) > number of devirts: 956->933 > number of devirts happening at inline: 781->868 > number of indirect calls promoted: 512->512 > > Inliner stats from: Unit growth for small function inlining: 7965701->9130051 > (14%) > to: Unit growth for small function inlining: 7965010->9138577 > > So it seems that except for large drop in number of clones there is no > significant difference. > > I am bootstrapping/regtesting this on x86_64-linux, does it seem OK? > > Honza > > * ipa-cp.c (ipcp_cloning_candidate_p): Use node->optimize_for_size_p. > (good_cloning_opportunity_p): Likewise. > (gather_context_independent_values): Do not return true when > polymorphic call context is known or when we have known aggregate > value of unused parameter. > (estimate_local_effects): Try to create clone for all context > when either some params are substituted or devirtualization is possible > or some params can be removed; use local flag instead of > node->will_be_removed_from_program_if_no_direct_calls_p. > (identify_dead_nodes): Likewise. > Index: ipa-cp.c > === > --- ipa-cp.c (revision 231477) > +++ ipa-cp.c (working copy) > @@ -613,7 +613,7 @@ ipcp_cloning_candidate_p (struct cgraph_ >return false; > } > > - if (!optimize_function_for_speed_p (DECL_STRUCT_FUNCTION (node->decl))) > + if (node->optimize_for_size_p ()) > { >if (dump_file) > fprintf (dump_file, "Not considering %s for cloning; " > @@ -2267,7 +2267,7 @@ good_cloning_opportunity_p (struct cgrap > { >if (time_benefit == 0 >|| !opt_for_fn (node->decl, flag_ipa_cp_clone) > - || !optimize_function_for_speed_p (DECL_STRUCT_FUNCTION (node->decl))) > + || node->optimize_for_size_p ()) > return false; > >gcc_assert (size_cost > 0); > @@ -2387,12 +2387,14 @@ gather_context_independent_values (struc > *removable_params_cost > += ipa_get_param_move_cost (info, i); > > + if (!ipa_is_param_used (info, i)) > + continue; > + Is this really necessary, is it not enough to remove the assignment to ret below? If the parameter is not used, devirtualization time bonus, which you then rely on estimate_local_effects, should be zero for it. It is a very minor point, I suppose, but if the function gets cloned for a different reason, it might still be beneficial to have as much context-
Re: [hsa 1/10] Configury changes and new options
Hi, On Tue, Dec 08, 2015 at 10:43:15PM +, Richard Sandiford wrote: > [Sorry for the low-quality review, was just reading out of interest...] > > Martin Jambor writes: > > +If you configure GCC with HSA offloading but do not have the HSA > > +run-time library installed in a standard location then you can > > +explicitely specify the directory where they are installed. The > > typo: explicitly oops. For some reason, my spell-checker accepts this typo. I will fix it. > > > diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c > > index e4772d1..5609207 100644 > > --- a/gcc/lto-wrapper.c > > +++ b/gcc/lto-wrapper.c > > @@ -745,6 +745,11 @@ compile_images_for_offload_targets (unsigned in_argc, > > char *in_argv[], > >offload_names = XCNEWVEC (char *, num_targets + 1); > >for (unsigned i = 0; i < num_targets; i++) > > { > > + /* HSA does not use LTO-like streaming and a different compiler, skip > > +it. */ > > + if (strncmp(names[i], "hsa", 3) == 0) > > + continue; > > + > >offload_names[i] > > = compile_offload_image (names[i], compiler_path, in_argc, in_argv, > > compiler_opts, compiler_opt_count, > > Looks like this would cause the caller loop: > > if (offload_names) > { > find_offloadbeginend (); > for (i = 0; offload_names[i]; i++) > printf ("%s\n", offload_names[i]); > free_array_of_ptrs ((void **) offload_names, i); > } > > to terminate early if there was another target after hsa. > Good catch. I have modified this code so that it never leaves any holes in offload_names[i]. > names[i] is null-terminated, so it looks like you're deliberately > allowing anything that starts with "hsa" here, but: Right, and that was probably a mistake, I have changed the check to simple strcmp. > > > diff --git a/gcc/opts.c b/gcc/opts.c > > index 874c84f..5647f0c 100644 > > --- a/gcc/opts.c > > +++ b/gcc/opts.c > > @@ -1906,8 +1906,35 @@ common_handle_option (struct gcc_options *opts, > >break; > > > > case OPT_foffload_: > > - /* Deferred. */ > > - break; > > + { > > + const char *p = arg; > > + opts->x_flag_disable_hsa = true; > > + while (*p != 0) > > + { > > + const char *comma = strchr (p, ','); > > + > > + if ((strncmp (p, "disable", 7) == 0) > > + && (p[7] == ',' || p[7] == '\0')) > > + { > > + opts->x_flag_disable_hsa = true; > > + break; > > + } > > + > > + if ((strncmp (p, "hsa", 3) == 0) > > + && (p[3] == ',' || p[3] == '\0')) > > + { > > +#ifdef ENABLE_HSA > > + opts->x_flag_disable_hsa = false; > > +#else > > + sorry ("HSA has not been enabled during configuration"); > > +#endif > > ...here you only allow "hsa" itself. > > (Not your fault, but: do we have any documentation for -foffload > and -foffload-abi? Couldn't see any in the texi files.) Yes, that is actually PR 67300. However, I do not understand the more complex forms the parameter can take enough to attempt to fix it. In order to address all for you concerns, I am going to install the following on the branch. Thanks for the feedback, Martin 2015-12-09 Martin Jambor * lto-wrapper.c (compile_images_for_offload_targets): Do not leave holes in offload_names. Use strcmp instead strncmp. * doc/install.texi (--with-hsa-runtime): Fix typo. --- gcc/doc/install.texi | 2 +- gcc/lto-wrapper.c| 8 +--- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi index afd891c..a85a063 100644 --- a/gcc/doc/install.texi +++ b/gcc/doc/install.texi @@ -1993,7 +1993,7 @@ compiler will emit the accelerator code, no path should be specified. If you configure GCC with HSA offloading but do not have the HSA run-time library installed in a standard location then you can -explicitely specify the directory where they are installed. The +explicitly specify the directory where they are installed. The @option{--with-hsa-runtime=@/@var{hsainstalldir}} option is a shorthand for @option{--with-hsa-runtime-lib=@/@var{hsainstalldir}/lib} and diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c index 5609207..5b58fd6 100644 --- a/gcc/lto-wrapper.c +++ b/gcc/lto-wrapper.c @@ -736,6 +736,7 @@ compile_images_for_offload_targets (unsigned in_argc
Re: [hsa 1/10] Configury changes and new options
Hi, On Mon, Dec 07, 2015 at 12:19:08PM +0100, Martin Jambor wrote: > Hi, > > this patch contains changes to the configuration mechanism and offload > bits, so that users can build compilers with HSA support. when writing up how to build an HSA-enabled GCC for the wiki page, and checking the process actually works I realized that AMD no longer ships libhsakmt as a part of the run time. So we either have to tell users to copy the library over to the same directory where the run-time is (what I did on my machines and then forgot about it) or provide one more configuration option, otherwise configure libgomp fails. The patch below does the second. I have verified that the configuration works as intended fo freshly downloaded/built HSA run-time and libhsakmt. I am going to commit it to the branch shortly and of course would need it as part of hsa merge, Sorry for realizing this so late, Martin 2015-12-09 Martin Jambor libgomp/ * plugin/configfrag.ac (hsa-kmt-lib): New. gcc/ * doc/install.texi (Configuration): Document --with-hsa-kmt-lib. --- gcc/doc/install.texi | 5 + libgomp/configure| 17 +++-- libgomp/plugin/configfrag.ac | 8 3 files changed, 28 insertions(+), 2 deletions(-) diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi index 232586d..afd891c 100644 --- a/gcc/doc/install.texi +++ b/gcc/doc/install.texi @@ -1999,6 +1999,11 @@ shorthand for @option{--with-hsa-runtime-lib=@/@var{hsainstalldir}/lib} and @option{--with-hsa-runtime-include=@/@var{hsainstalldir}/include}. +@item --with-hsa-kmt-lib=@var{pathname} + +If you configure GCC with HSA offloading but do not have the HSA +KMT library installed in a standard location then you can +explicitly specify the directory where it resides. @end table @subheading Cross-Compiler-Specific Options diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac index c50e5cb..fd77429 100644 --- a/libgomp/plugin/configfrag.ac +++ b/libgomp/plugin/configfrag.ac @@ -118,6 +118,14 @@ if test "x$HSA_RUNTIME_LIB" != x; then HSA_RUNTIME_LDFLAGS=-L$HSA_RUNTIME_LIB fi +AC_ARG_WITH(hsa-kmt-lib, + [AS_HELP_STRING([--with-hsa-kmt-lib=PATH], + [specify directory for installed HSA KMT library.])]) +if test "x$with_hsa_kmt_lib" != x; then + HSA_RUNTIME_LDFLAGS="$HSA_RUNTIME_LDFLAGS -L$with_hsa_kmt_lib" + HSA_RUNTIME_LIB= +fi + PLUGIN_HSA=0 PLUGIN_HSA_CPPFLAGS= PLUGIN_HSA_LDFLAGS= -- 2.6.3
Re: [hsa 2/10] Modifications to libgomp proper
Hi, thanks for the feedback. I have incorporated most of it into the branch (the diff is below) but also have a few questions. On Wed, Dec 09, 2015 at 12:35:36PM +0100, Jakub Jelinek wrote: > On Mon, Dec 07, 2015 at 12:19:57PM +0100, Martin Jambor wrote: > > +/* Flag set when the subsequent element in the device-specific argument > > + values. */ > > +#define GOMP_TARGET_ARG_SUBSEQUENT_PARAM (1 << 7) > > + > > +/* Bitmask to apply to a target argument to find out the value identifier. > > */ > > +#define GOMP_TARGET_ARG_ID_MASK(((1 << 8) - 1) << 8) > > +/* Target argument index of NUM_TEAMS. */ > > +#define GOMP_TARGET_ARG_NUM_TEAMS (1 << 8) > > +/* Target argument index of THREAD_LIMIT. */ > > +#define GOMP_TARGET_ARG_THREAD_LIMIT (2 << 8) > > I meant that these two would be just special, passed as the first two > pointers in the array, without the markup. Because, otherwise you either > need to use GOMP_TARGET_ARG_SUBSEQUENT_PARAM for these always, or for 32-bit > arches and for 64-bit ones shift often at runtime. Having the markup even > for them is perhaps cleaner, but less efficient, so if you really want to go > that way, please make sure you handle it properly for 32-bit pointers > architectures though. num_teams or thread_limit could be > 32767 or > > 65535. I see, I prefer the clean approach, even if it is more work, this interface looks like it is going to be extended in the future. But I am wondering whether embedding the value into the identifier element is actually worth it. The passed array is going to be a small local variable and I wonder whether there is going to be any benefit in it having two elements instead of four (or four instead of six for gridified kernels), especially if it means introducing control flow on the part of the caller. But if you really want it that way, I will implement that. > > > -static void > > -gomp_target_fallback_firstprivate (void (*fn) (void *), size_t mapnum, > > - void **hostaddrs, size_t *sizes, > > - unsigned short *kinds) > > +static void * > > +gomp_target_unshare_firstprivate (size_t mapnum, void **hostaddrs, > > + size_t *sizes, unsigned short *kinds) > > { > >size_t i, tgt_align = 0, tgt_size = 0; > >char *tgt = NULL; > > @@ -1281,7 +1282,7 @@ gomp_target_fallback_firstprivate (void (*fn) (void > > *), size_t mapnum, > >} > >if (tgt_align) > > { > > - tgt = gomp_alloca (tgt_size + tgt_align - 1); > > + tgt = gomp_malloc (tgt_size + tgt_align - 1); > > I don't like using gomp_malloc here, either copy/paste the function, or > create separate inline functions for the two loops, one for the first loop > which returns you tgt_align and tgt_size, and another for the stuff after > the allocation. Then you can use those two inline functions to implement > both gomp_target_fallback_firstprivate which will use alloca, and > gomp_target_unshare_firstprivate which will use gomp_malloc instead. OK, I did that. > > > @@ -1356,6 +1377,11 @@ GOMP_target (int device, void (*fn) (void *), const > > void *unused, > > and several arguments have been added: > > FLAGS is a bitmask, see GOMP_TARGET_FLAG_* in gomp-constants.h. > > DEPEND is array of dependencies, see GOMP_task for details. > > + ARGS is a pointer to an array consisting of NUM_TEAMS, THREAD_LIMIT and > > a > > + variable number of device-specific arguments, which always take two > > elements > > + where the first specifies the type and the second the actual value. The > > + last element of the array is a single NULL. > > Note, here you document NUM_TEAMS and THREAD_LIMIT as special values, not > encoded. I have changed the comment but will remember to do it again if necessary after changing omp-low.c > > > @@ -1473,6 +1508,7 @@ GOMP_target_data (int device, const void *unused, > > size_t mapnum, > >struct gomp_device_descr *devicep = resolve_device (device); > > > >if (devicep == NULL > > + || (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) > >|| !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)) > > Would be nice to have some consistency in the order of capabilities checks. > Usually you check SHARED_MEM after OPENMP_400, so perhaps do it this way > here too. Sure. > > > @@ -1741,23 +1784,38 @@ gomp_target_task_fn (void *data) > > > >if (ttask->state == GOMP_TARGET_TASK_FINISHED) > > { > > -