[wwwdocs] gcc-14/changes.html (AMD GCN): Mention gfx90c support

2024-04-26 Thread Andrew Stubbs
I will push this shortly. I think the gfx90c patch just made the cut for the GCC-14 branch! Andrew --- htdocs/gcc-14/changes.html | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html index fce0fb44..47fef32d 100644

[wwwdocs] gcc-14/changes.html (AMD GCN): Mention gfx90c support

2024-04-26 Thread Andrew Stubbs
I will push this shortly. I think the gfx90c patch just made the cut for the GCC-14 branch! Andrew --- htdocs/gcc-14/changes.html | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html index fce0fb44..47fef32d 100644

Re: [PATCH] amdgcn: Add gfx90c target

2024-04-26 Thread Andrew Stubbs
On 25/04/2024 19:37, Frederik Harwath wrote: Hi Andrew, this patch adds support for gfx90c GCN5 APU integrated graphics devices. The LLVM AMDGPU documentation (https://llvm.org/docs/AMDGPUUsage.html) lists those devices as unsupported by rocm-amdhsa. As we have discussed elsewhere, I have tested

Re: [patch] [gcn][nvptx] Add warning to mkoffload for 32bit host code

2024-04-25 Thread Andrew Stubbs
On 25/04/2024 11:51, Tobias Burnus wrote: Motivated by a surprise of a colleague that with -m32, no offload dumps were created; that's because mkoffload does not process host binaries when the are 32bit (i.e. ilp32). Internally, that done as follows: The host compiler passes to 'mkoffload' the

Re: GCN: Enable effective-target 'vect_long_long'

2024-04-17 Thread Andrew Stubbs
On 16/04/2024 20:01, Thomas Schwinge wrote: Hi! OK to push the attached "GCN: Enable effective-target 'vect_long_long'"? (Or is that not what you'd expect to see for GCN? I haven't checked the actual back end code...) I think if there are still missing int64 vector operations then they're

Re: [wwwdocs] gcc-14/changes.html (AMD GCN): Mention gfx1036 support

2024-04-15 Thread Andrew Stubbs
On 15/04/2024 13:00, Richard Biener wrote: On Mon, Apr 15, 2024 at 12:04 PM Tobias Burnus wrote: I experimented with some variants to make clearer that each of RDNA2 and RNDA3 applies to two card types, but at the end I settled on the fewest-word version. Comments, remarks, suggestions? (To

Re: [wwwdocs] gcc-14/changes.html (AMD GCN): Mention gfx1036 support

2024-04-15 Thread Andrew Stubbs
On 15/04/2024 11:03, Tobias Burnus wrote: I experimented with some variants to make clearer that each of RDNA2 and RNDA3 applies to two card types, but at the end I settled on the fewest-word version. Comments, remarks, suggestions? (To this change or in general?) Current version:

Re: GCN: '--param=gcn-preferred-vector-lane-width=[default,32,64]'

2024-04-08 Thread Andrew Stubbs
On 08/04/2024 11:45, Thomas Schwinge wrote: Hi! On 2024-03-28T08:00:50+0100, I wrote: On 2024-03-22T15:54:48+, Andrew Stubbs wrote: This patch alters the default (preferred) vector size to 32 on RDNA devices to better match the actual hardware. 64-lane vectors will continue to be used

Re: [Patch] GCN: install.texi update for Newlib change and LLVM 18 release

2024-04-03 Thread Andrew Stubbs
On 03/04/2024 10:27, Jakub Jelinek wrote: On Wed, Apr 03, 2024 at 11:09:19AM +0200, Tobias Burnus wrote: @@ -3954,8 +3956,8 @@ on the GPU. To enable support for GCN3 Fiji devices (gfx803), GCC has to be configured with @option{--with-arch=@code{fiji}} or

Re: [Patch] GCN: Fix --with-arch= handling in mkoffload [PR111966]

2024-04-03 Thread Andrew Stubbs
On 03/04/2024 10:05, Tobias Burnus wrote: This patch handles --with-arch= in GCN's mkoffload.cc While mkoffload mostly does not know this and passes it through to the GCN lto1 compiler, it writes an .o file with debug information - and here the -march= in the ELF flags must agree with the one

Re: [PATCH] amdgcn: Add gfx1036 target

2024-03-25 Thread Andrew Stubbs
On 25/03/2024 11:27, Richard Biener wrote: Add support for the gfx1036 RDNA2 APU integrated graphics devices. The ROCm documentation warns that these may not be supported, but it seems to work at least partially. x86 host bootstrap/regtest running, target-libgomp testing for the offload

Re: GCN: Enable effective-target 'vect_long_mult'

2024-03-25 Thread Andrew Stubbs
On 21/03/2024 10:41, Thomas Schwinge wrote: Hi! OK to push the attached "GCN: Enable effective-target 'vect_long_mult'"? (Or is that not what you'd expect to see for GCN? I haven't checked the actual back end code...) OK. Andrew

Re: GCN: Enable effective-target 'vect_hw_misalign'

2024-03-25 Thread Andrew Stubbs
On 21/03/2024 10:41, Thomas Schwinge wrote: Hi! OK to push the attached "GCN: Enable effective-target 'vect_hw_misalign'"? (Or is that not what you'd expect to see for GCN? I haven't checked the actual back end code...) OK. Andrew.

[wwwdocs, committed] gcc-14: amdgcn: Add gfx1103

2024-03-22 Thread Andrew Stubbs
I added a note about gfx1103 to the existing text for gfx1100. Andrew --- htdocs/gcc-14/changes.html | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html index d88fbc96..880b9195 100644 ---

[committed] amdgcn: Adjust GFX10/GFX11 cache coherency

2024-03-22 Thread Andrew Stubbs
The RDNA devices have different cache architectures to the CDNA devices, and the differences go deeper than just the assembler mnemonics, so we probably need to generate different code to maintain coherency across the whole device. I believe this patch is correct according to the documentation in

[committed] amdgcn: Prefer V32 on RDNA devices

2024-03-22 Thread Andrew Stubbs
This patch alters the default (preferred) vector size to 32 on RDNA devices to better match the actual hardware. 64-lane vectors will continue to be used where they are hard-coded (such as function prologues). We run these devices in wavefrontsize64 for compatibility, but they actually only have

[committed] amdgcn: Add gfx1103 target

2024-03-22 Thread Andrew Stubbs
This patch adds support for the gfx1103 RDNA3 APU integrated graphics devices. The ROCm documentation warns that these may not be supported, but it seems to work at least partially. This device should be considered "Experimental" at this point, although so far it seems to be at least as

Re: [PATCH] vect: more oversized bitmask fixups

2024-03-22 Thread Andrew Stubbs
On 22/03/2024 08:43, Richard Biener wrote: I'll note that we don't pass 'val' there and 'val' is unfortunately not documented - what's it supposed to be? I think I placed the original fix in do_compare_and_jump because we have the full into available there. So what's the

Re: [committed] amdgcn: Ensure gfx11 is running in cumode

2024-03-22 Thread Andrew Stubbs
On 22/03/2024 11:56, Thomas Schwinge wrote: Hi Andrew! On 2024-03-21T13:39:53+, Andrew Stubbs wrote: CUmode "on" is the setting for compatibility with GCN and CDNA devices. --- a/gcc/config/gcn/gcn-hsa.h +++ b/gcc/config/gcn/gcn-hsa.h @@ -107,6 +107,7 @@ extern un

Re: [PATCH] vect: more oversized bitmask fixups

2024-03-21 Thread Andrew Stubbs
On 21/03/2024 15:18, Richard Biener wrote: On Thu, Mar 21, 2024 at 3:23 PM Andrew Stubbs wrote: My previous patch to fix this problem with xor was rejected because we want to fix these issues only at the point of use. That patch produced slightly better code, in this example, but this works

[PATCH] vect: more oversized bitmask fixups

2024-03-21 Thread Andrew Stubbs
My previous patch to fix this problem with xor was rejected because we want to fix these issues only at the point of use. That patch produced slightly better code, in this example, but this works too These patches fix up a failure in testcase vect/tsvc/vect-tsvc-s278.c when configured to use

[committed] amdgcn: Ensure gfx11 is running in cumode

2024-03-21 Thread Andrew Stubbs
CUmode "on" is the setting for compatibility with GCN and CDNA devices. Committed to mainline. gcc/ChangeLog: * config/gcn/gcn-hsa.h (ASM_SPEC): Pass -mattr=+cumode. --- gcc/config/gcn/gcn-hsa.h | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/config/gcn/gcn-hsa.h

[commmitted] amdgcn: Comment correction

2024-03-21 Thread Andrew Stubbs
The location of the marker was changed, but the comment wasn't updated. Fixed now. Committed to mainline gcc/ChangeLog: * config/gcn/gcn.cc (gcn_expand_builtin_1): Comment correction. --- gcc/config/gcn/gcn.cc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git

[committed] amdgcn: Clean up device memory in gcn-run

2024-03-21 Thread Andrew Stubbs
There are some stability issues in the ROC runtime or drivers when we run too many tests in quick succession. I was hoping this patch might fix it, but no; still good to fix the omissions though. Committed to mainline. gcc/ChangeLog: * config/gcn/gcn-run.cc (main): Add an

Re: GCN: Enable effective-target 'vect_early_break', 'vect_early_break_hw'

2024-03-21 Thread Andrew Stubbs
On 21/03/2024 10:41, Thomas Schwinge wrote: Hi! On 2024-01-12T15:02:35+0100, I wrote: OK to push the attached "GCN: Enable effective-target 'vect_early_break', 'vect_early_break_hw'"? Ping. (Or is that not what you'd expect to see for GCN? I haven't checked the actual back end code...)

Re: [Patch][RFC] GCN: Define ISA archs in gcn-devices.def and use it

2024-03-15 Thread Andrew Stubbs
On 15/03/2024 13:56, Tobias Burnus wrote: Hi Andrew, Andrew Stubbs wrote: This is more-or-less what I was planning to do myself, but as I want to include all the other features that get parametrized in gcn.cc, gcn.h, gcn-hsa.h, gcn-opts.h, I hadn't got around to it yet. Unfortunately, I

Re: [Patch][RFC] GCN: Define ISA archs in gcn-devices.def and use it

2024-03-15 Thread Andrew Stubbs
On 15/03/2024 12:21, Tobias Burnus wrote: Given the large number of AMD GPU ISAs and the number of files which have to be adapted, I wonder whether it makes sense to consolidate this a bit, especially in the light that we may want to support more in the future. Besides using some macros, I

Re: [PATCH] vect: Use xor to invert oversized vector masks

2024-03-15 Thread Andrew Stubbs
On 15/03/2024 07:35, Richard Biener wrote: On Fri, Mar 15, 2024 at 4:35 AM Hongtao Liu wrote: On Thu, Mar 14, 2024 at 11:42 PM Andrew Stubbs wrote: Don't enable excess lanes when inverting vector bit-masks smaller than the integer mode. This is yet another case of wrong-code due

Re: [PATCH] vect: Use xor to invert oversized vector masks

2024-03-15 Thread Andrew Stubbs
On 15/03/2024 03:45, Hongtao Liu wrote: On Thu, Mar 14, 2024 at 11:42 PM Andrew Stubbs wrote: Don't enable excess lanes when inverting vector bit-masks smaller than the integer mode. This is yet another case of wrong-code due to mishandling of oversized bitmasks. This issue shows up in vect

[PATCH] vect: Use xor to invert oversized vector masks

2024-03-14 Thread Andrew Stubbs
Don't enable excess lanes when inverting vector bit-masks smaller than the integer mode. This is yet another case of wrong-code due to mishandling of oversized bitmasks. This issue shows up in vect/tsvc/vect-tsvc-s278.c and vect/tsvc/vect-tsvc-s279.c if I set the preferred vector size to V32

Re: GCN: The original meaning of 'GCN_SUPPRESS_HOST_FALLBACK' isn't applicable (non-shared memory system)

2024-03-08 Thread Andrew Stubbs
On 08/03/2024 10:16, Thomas Schwinge wrote: Hi! So, attached here is now a different patch "GCN: The original meaning of 'GCN_SUPPRESS_HOST_FALLBACK' isn't applicable (non-shared memory system)", that takes a different approach re clarifying the two orthogonal aspects that the

Re: GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 'init_hsa_runtime_functions' is not fatal

2024-03-07 Thread Andrew Stubbs
On 07/03/2024 13:37, Thomas Schwinge wrote: Hi Andrew! On 2024-03-07T11:38:27+, Andrew Stubbs wrote: On 07/03/2024 11:29, Thomas Schwinge wrote: On 2019-11-12T13:29:16+, Andrew Stubbs wrote: This patch contributes the GCN libgomp plugin, with the various configure and make bits

Re: GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 'init_hsa_runtime_functions' is not fatal

2024-03-07 Thread Andrew Stubbs
On 07/03/2024 11:29, Thomas Schwinge wrote: Hi! On 2019-11-12T13:29:16+, Andrew Stubbs wrote: This patch contributes the GCN libgomp plugin, with the various configure and make bits to go with it. An issue with libgomp GCN plugin 'GCN_SUPPRESS_HOST_FALLBACK' (which is different from

Re: amdgcn: additional gfx1030/gfx1100 support: adjust test cases

2024-03-06 Thread Andrew Stubbs
On 06/03/2024 13:49, Thomas Schwinge wrote: Hi! On 2024-01-24T12:43:04+, Andrew Stubbs wrote: This [...] ... became commit 99890e15527f1f04caef95ecdd135c9f1a077f08 "amdgcn: additional gfx1030/gfx1100 support", and included the following: --- a/gcc/config/gcn/gcn-valu.md

Re: Stabilize flaky GCN target/offloading testing

2024-03-06 Thread Andrew Stubbs
On 06/03/2024 12:09, Thomas Schwinge wrote: Hi! On 2024-02-21T17:32:13+0100, Richard Biener wrote: Am 21.02.2024 um 13:34 schrieb Thomas Schwinge : [...] per my work on "libgomp make check time is excessive", all execution testing in libgomp is serialized in

Re: [PATCH] vect: Fix integer overflow calculating mask

2024-03-04 Thread Andrew Stubbs
On 23/02/2024 15:13, Richard Biener wrote: On Fri, 23 Feb 2024, Jakub Jelinek wrote: On Fri, Feb 23, 2024 at 02:22:19PM +, Andrew Stubbs wrote: On 23/02/2024 13:02, Jakub Jelinek wrote: On Fri, Feb 23, 2024 at 12:58:53PM +, Andrew Stubbs wrote: This is a follow-up to the previous

Re: [PATCH] vect: Fix integer overflow calculating mask

2024-02-23 Thread Andrew Stubbs
On 23/02/2024 13:02, Jakub Jelinek wrote: On Fri, Feb 23, 2024 at 12:58:53PM +, Andrew Stubbs wrote: This is a follow-up to the previous patch to ensure that integer vector bit-masks do not have excess bits set. It fixes a bug, observed on amdgcn, in which the mask could be incorrectly set

[PATCH] vect: Fix integer overflow calculating mask

2024-02-23 Thread Andrew Stubbs
This is a follow-up to the previous patch to ensure that integer vector bit-masks do not have excess bits set. It fixes a bug, observed on amdgcn, in which the mask could be incorrectly set to zero, resulting in wrong-code. The mask was broken when nunits==32. The patched version will probably be

Re: GCN: Conditionalize 'define_expand "reduc__scal_"' on '!TARGET_RDNA2_PLUS' [PR113615]

2024-02-16 Thread Andrew Stubbs
On 16/02/2024 14:34, Thomas Schwinge wrote: Hi! On 2024-01-29T11:34:05+0100, Tobias Burnus wrote: Andrew wrote off list: "Vector reductions don't work on RDNA, as is, but they're supposed to be disabled by the insn condition" This patch disables "fold_left_plus_", which is about

Re: GCN RDNA2+ vs. GCC SLP vectorizer

2024-02-16 Thread Andrew Stubbs
On 16/02/2024 12:26, Richard Biener wrote: On Fri, 16 Feb 2024, Andrew Stubbs wrote: On 16/02/2024 10:17, Richard Biener wrote: On Fri, 16 Feb 2024, Thomas Schwinge wrote: Hi! On 2023-10-20T12:51:03+0100, Andrew Stubbs wrote: I've committed this patch ... as commit

Re: GCN RDNA2+ vs. GCC SLP vectorizer

2024-02-16 Thread Andrew Stubbs
On 16/02/2024 10:17, Richard Biener wrote: On Fri, 16 Feb 2024, Thomas Schwinge wrote: Hi! On 2023-10-20T12:51:03+0100, Andrew Stubbs wrote: I've committed this patch ... as commit c7ec7bd1c6590cf4eed267feab490288e0b8d691 "amdgcn: add -march=gfx1030 EXPERIMENTAL", which the l

Re: GCN RDNA2+ vs. GCC vectorizer "Reduce using vector shifts"

2024-02-15 Thread Andrew Stubbs
On 15/02/2024 10:23, Thomas Schwinge wrote: Hi! On 2024-02-15T08:49:17+0100, Richard Biener wrote: On Wed, 14 Feb 2024, Andrew Stubbs wrote: On 14/02/2024 13:43, Richard Biener wrote: On Wed, 14 Feb 2024, Andrew Stubbs wrote: On 14/02/2024 13:27, Richard Biener wrote: On Wed, 14 Feb 2024

Re: GCN RDNA2+ vs. GCC vectorizer "Reduce using vector shifts"

2024-02-15 Thread Andrew Stubbs
On 15/02/2024 10:21, Richard Biener wrote: [snip] I suppse if RDNA really only has 32 lane vectors (it sounds like it, even if it can "simulate" 64 lane ones?) then it might make sense to vectorize for 32 lanes? That said, with variable-length it likely doesn't matter but I'd not expose

Re: GCN RDNA2+ vs. GCC vectorizer "Reduce using vector shifts"

2024-02-15 Thread Andrew Stubbs
On 15/02/2024 07:49, Richard Biener wrote: On Wed, 14 Feb 2024, Andrew Stubbs wrote: On 14/02/2024 13:43, Richard Biener wrote: On Wed, 14 Feb 2024, Andrew Stubbs wrote: On 14/02/2024 13:27, Richard Biener wrote: On Wed, 14 Feb 2024, Andrew Stubbs wrote: On 13/02/2024 08:26, Richard

Re: GCN RDNA2+ vs. GCC vectorizer "Reduce using vector shifts"

2024-02-14 Thread Andrew Stubbs
On 14/02/2024 13:43, Richard Biener wrote: On Wed, 14 Feb 2024, Andrew Stubbs wrote: On 14/02/2024 13:27, Richard Biener wrote: On Wed, 14 Feb 2024, Andrew Stubbs wrote: On 13/02/2024 08:26, Richard Biener wrote: On Mon, 12 Feb 2024, Thomas Schwinge wrote: Hi! On 2023-10-20T12:51:03

Re: GCN RDNA2+ vs. GCC vectorizer "Reduce using vector shifts"

2024-02-14 Thread Andrew Stubbs
On 14/02/2024 13:27, Richard Biener wrote: On Wed, 14 Feb 2024, Andrew Stubbs wrote: On 13/02/2024 08:26, Richard Biener wrote: On Mon, 12 Feb 2024, Thomas Schwinge wrote: Hi! On 2023-10-20T12:51:03+0100, Andrew Stubbs wrote: I've committed this patch ... as commit

Re: GCN RDNA2+ vs. GCC vectorizer "Reduce using vector shifts"

2024-02-14 Thread Andrew Stubbs
On 13/02/2024 08:26, Richard Biener wrote: On Mon, 12 Feb 2024, Thomas Schwinge wrote: Hi! On 2023-10-20T12:51:03+0100, Andrew Stubbs wrote: I've committed this patch ... as commit c7ec7bd1c6590cf4eed267feab490288e0b8d691 "amdgcn: add -march=gfx1030 EXPERIMENTAL". The RDNA2 I

Re: [PATCH] libgomp: testsuite: Don't XPASS libgomp.c/alloc-pinned-1.c etc. on non-Linux targets [PR113448]

2024-02-12 Thread Andrew Stubbs
On 05/02/2024 13:04, Rainer Orth wrote: Two libgomp tests XPASS on Solaris (any non-Linux target actually) since their introduction: XPASS: libgomp.c/alloc-pinned-1.c execution test XPASS: libgomp.c/alloc-pinned-2.c execution test The problem is that the test just prints OS unsupported and

Re: GCN: Don't hard-code number of SGPR/VGPR/AVGPR registers

2024-02-01 Thread Andrew Stubbs
On 01/02/2024 13:49, Thomas Schwinge wrote: Hi! On 2018-12-12T11:52:52+, Andrew Stubbs wrote: This patch contains the major part of the GCN back-end. [...] --- /dev/null +++ b/gcc/config/gcn/gcn.c +void +gcn_hsa_declare_function_name (FILE *file, const char *name, tree

Re: GCN, RDNA 3: Adjust 'sync_compare_and_swap_lds_insn'

2024-02-01 Thread Andrew Stubbs
On 01/02/2024 11:36, Thomas Schwinge wrote: Hi! On 2024-01-31T11:31:00+, Andrew Stubbs wrote: On 31/01/2024 10:36, Thomas Schwinge wrote: OK to push "GCN, RDNA 3: Adjust 'sync_compare_and_swap_lds_insn'", see attached? In pre-RDNA 3 ISA manuals, there are notes for

Re: GCN: Remove 'FIRST_{SGPR,VGPR,AVGPR}_REG', 'LAST_{SGPR,VGPR,AVGPR}_REG' from machine description

2024-01-31 Thread Andrew Stubbs
On 31/01/2024 17:21, Thomas Schwinge wrote: Hi! On 2018-12-12T11:52:23+, Andrew Stubbs wrote: This patch contains the machine description portion of the GCN back-end. [...] --- /dev/null +++ b/gcc/config/gcn/gcn.md +;; {{{ Constants and enums + +; Named registers +(define_constants

Re: GCN: Remove 'SGPR_OR_VGPR_REGNO_P' definition

2024-01-31 Thread Andrew Stubbs
On 31/01/2024 17:12, Thomas Schwinge wrote: Hi! On 2018-12-12T11:52:52+, Andrew Stubbs wrote: This patch contains the major part of the GCN back-end. [...] --- /dev/null +++ b/gcc/config/gcn/gcn.h +#define FIRST_SGPR_REG 0 +#define SGPR_REGNO(N) ((N)+FIRST_SGPR_REG

Re: GCN, RDNA 3: Adjust 'sync_compare_and_swap_lds_insn'

2024-01-31 Thread Andrew Stubbs
On 31/01/2024 10:36, Thomas Schwinge wrote: Hi! OK to push "GCN, RDNA 3: Adjust 'sync_compare_and_swap_lds_insn'", see attached? In pre-RDNA 3 ISA manuals, there are notes for 'DS_CMPST_[...]', like: Caution, the order of src and cmp are the *opposite* of the BUFFER_ATOMIC_CMPSWAP

Re: [patch] gcn/gcn-valu.md: Disable fold_left_plus for TARGET_RDNA2_PLUS [PR113615]

2024-01-29 Thread Andrew Stubbs
On 29/01/2024 12:50, Tobias Burnus wrote: Andrew Stubbs wrote: /tmp/ccrsHfVQ.mkoffload.2.s:788736:27: error: value out of range    .amdhsa_next_free_vgpr    516 ^~~ [Obviously, likewise forlibgomp.c++/.. Hmm, supposedly there are 768

Re: [patch] gcn/gcn-valu.md: Disable fold_left_plus for TARGET_RDNA2_PLUS [PR113615]

2024-01-29 Thread Andrew Stubbs
On 29/01/2024 10:34, Tobias Burnus wrote: Andrew wrote off list:   "Vector reductions don't work on RDNA, as is, but they're    supposed to be disabled by the insn condition" This patch disables "fold_left_plus_", which is about vectorization and in the code path shown in the backtrace. I can

Re: [wwwdocs][patch] gcc-14/changes.html (amdgcn): Update for gfx1030/gfx1100

2024-01-29 Thread Andrew Stubbs
On 26/01/2024 17:06, Tobias Burnus wrote: Mention that gfx1030/gfx1100 are now supported. As noted in another thread, LLVM 15's assembler is now required, before LLVM 13.0.1 would do. (Alternatively, disabling gfx1100 support would do.) Hence, the added link to the install documentation.

Re: [patch] install.texi: For gcn, recommend LLVM 15, unless gfx1100 is disabled

2024-01-29 Thread Andrew Stubbs
On 26/01/2024 16:45, Tobias Burnus wrote: Hi, Thomas Schwinge wrote: amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs ... Further down in that file, we state: @anchor{amdgcn-x-amdhsa} @heading amdgcn-*-amdhsa AMD GCN GPU target. Instead

Re: [patch][v2] gcn/mkoffload.cc: Fix SRAM_ECC and XNACK handling [PR111966]

2024-01-29 Thread Andrew Stubbs
On 25/01/2024 15:11, Tobias Burnus wrote: Updated patch enclosed. Tobias Burnus wrote: I have now run the attached script and the result running yesterday's build with both my patch and your patch applied. (And the now committed gcn-hsa.h patch) Now the result with the testscript is: *

Re: [PATCH] Avoid registering unsupported OMP offload devices

2024-01-26 Thread Andrew Stubbs
On 26/01/2024 14:21, Richard Biener wrote: On Fri, 26 Jan 2024, Jakub Jelinek wrote: On Fri, Jan 26, 2024 at 03:04:11PM +0100, Richard Biener wrote: Otherwise it looks reasoanble to me, but let's see what Andrew thinks. 'n' before 'a', please. ;-) ?! I've misspelled a word. @@ -1443,6

Re: [PATCH] Avoid registering unsupported OMP offload devices

2024-01-26 Thread Andrew Stubbs
On 26/01/2024 14:04, Richard Biener wrote: On Fri, 26 Jan 2024, Andrew Stubbs wrote: On 26/01/2024 12:06, Jakub Jelinek wrote: On Fri, Jan 26, 2024 at 01:00:28PM +0100, Richard Biener wrote: The following avoids registering unsupported GCN offload devices when iterating over available ones

Re: [PATCH] Avoid registering unsupported OMP offload devices

2024-01-26 Thread Andrew Stubbs
On 26/01/2024 12:06, Jakub Jelinek wrote: On Fri, Jan 26, 2024 at 01:00:28PM +0100, Richard Biener wrote: The following avoids registering unsupported GCN offload devices when iterating over available ones. With a Zen4 desktop CPU you will have an IGPU (unspported) which will otherwise be made

Re: [PATCH] Fix architecture support in OMP_OFFLOAD_init_device for gcn

2024-01-26 Thread Andrew Stubbs
On 26/01/2024 11:42, Richard Biener wrote: The following makes the existing architecture support check work instead of being optimized away (enum vs. -1). This avoids later asserts when we assume such devices are never actually used. Tested as previously, now the error is libgomp: GCN fatal

Re: [patch] gcn/gcn-hsa.h: Always pass --amdhsa-code-object-version= in ASM_SPEC

2024-01-26 Thread Andrew Stubbs
On 26/01/2024 10:39, Tobias Burnus wrote: Hi all, Andrew Stubbs wrote: On 26/01/2024 07:29, Richard Biener wrote: If you link against prebuilt objects with COV 5 it seems there's no way to override the COV version GCC uses?  That is, do we want to add a -mcode-object-version=... option

Re: [PATCH] Avoid using an unsupported agent when offloading to GCN

2024-01-26 Thread Andrew Stubbs
On 26/01/2024 10:40, Richard Biener wrote: The following avoids selecting an unsupported agent early, avoiding later asserts when we rely on it being supported. tested on x86_64-unknown-linux-gnu -> amdhsa-gcn on gfx1060 that's the alternative to the other patch. I do indeed seem to get the

Re: [PATCH] Avoid assert for unknown device ISAs in GCN libgomp plugin

2024-01-26 Thread Andrew Stubbs
On 26/01/2024 10:30, Richard Biener wrote: When the agent reports a device ISA we don't support avoid hitting an assert, instead report the raw integers as error. I'm not sure whether -1 is special as I didn't figure where that field is initialized. But I guess since agents are not rejected

Re: [PATCH] amdgcn: additional gfx1100 support

2024-01-26 Thread Andrew Stubbs
On 26/01/2024 10:22, Richard Biener wrote: On Fri, 26 Jan 2024, Andrew Stubbs wrote: On 26/01/2024 09:45, Richard Biener wrote: On Fri, 26 Jan 2024, Richard Biener wrote: === libgomp Summary === # of expected passes29126 # of unexpected failures697

Re: [PATCH] amdgcn: additional gfx1100 support

2024-01-26 Thread Andrew Stubbs
On 26/01/2024 09:45, Richard Biener wrote: On Fri, 26 Jan 2024, Richard Biener wrote: === libgomp Summary === # of expected passes29126 # of unexpected failures697 # of unexpected successes 1 # of expected failures 703 # of unresolved

Re: [patch] gcn/gcn-hsa.h: Always pass --amdhsa-code-object-version= in ASM_SPEC

2024-01-26 Thread Andrew Stubbs
On 26/01/2024 07:29, Richard Biener wrote: On Fri, Jan 26, 2024 at 12:04 AM Tobias Burnus wrote: When targeting AMD GPUs, the LLVM assembler (and linker) are used. Two days ago LLVM changed the default for the AMDHSA code object version (COV) from 4 to 5. In principle, we do not care which

Re: [patch] gcn/gcn-hsa.h: Always pass --amdhsa-code-object-version= in ASM_SPEC

2024-01-26 Thread Andrew Stubbs
On 25/01/2024 23:03, Tobias Burnus wrote: When targeting AMD GPUs, the LLVM assembler (and linker) are used. Two days ago LLVM changed the default for theAMDHSA code object version (COV) from 4 to 5. In principle, we do not care which COV is used as long as it works; unfortunately,

Re: [patch] gcn: Add missing space to ASM_SPEC in gcn-hsa.h

2024-01-25 Thread Andrew Stubbs
On 25/01/2024 12:44, Tobias Burnus wrote: This patch avoids assembler warnings for gfx908 and gfx90a such as '-xnack-mattr=-sramecc' is not a recognized feature for this target(ignoring feature) as we pass -mattr=-xnack-mattr=-sramecc to the llvm-mc assembler. Solution: Add a space

Re: [patch] gcn/mkoffload.cc: Fix SRAM_ECC and XNACK handling [PR111966]

2024-01-25 Thread Andrew Stubbs
On 24/01/2024 22:12, Tobias Burnus wrote: This patch fixes "-g" debug compilation for gfx1100 and gfx1030, which fail to link when "-g" is specified. The reason is: When using gfx1100 and compiling with '-g' I was running into an error because the eflags used for the debugger file has

[PATCH] amdgcn: additional gfx1100 support

2024-01-24 Thread Andrew Stubbs
(RTC_TICKS): Configure RDNA3. (omp_get_wtime): Add RDNA3-compatible variant. * plugin/plugin-gcn.c (max_isa_vgprs): Tune for gfx1030 and gfx1100. Signed-off-by: Andrew Stubbs --- gcc/config/gcn/gcn-opts.h | 2 +- gcc/config/gcn/gcn-valu.md

[PATCH] Update my email in MAINTAINERS

2024-01-23 Thread Andrew Stubbs
I've moved to BayLibre and don't have access to my codesourcery.com address, at least for a while. ChangeLog: * MAINTAINERS: Update Signed-off-by: Andrew Stubbs --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index

Re: [PATCH] gcn: Fix a warning

2024-01-23 Thread Andrew Stubbs
On Tue, 23 Jan 2024 at 10:01, Jakub Jelinek wrote: > Hi! > > I see > ../../gcc/config/gcn/gcn.cc: In function ‘void > gcn_hsa_declare_function_name(FILE*, const char*, tree)’: > ../../gcc/config/gcn/gcn.cc:6568:67: warning: unused parameter ‘decl’ > [-Wunused-parameter] > 6568 |

Re: [Patch] xfail libgomp.c/declare-variant-4-{fiji,gfx803}.c

2024-01-22 Thread Andrew Stubbs
On Fri, 19 Jan 2024 at 18:27, Tobias Burnus wrote: > The problem is as described at > https://gcc.gnu.org/install/specific.html#amdgcn-x-amdhsa > > "Note that support for Fiji devices has been removed in ROCm 4.0 and > support in LLVM is deprecated and will be removed in LLVM 18." > > Therefore,

Re: [Patch] GCN: Add pre-initial support for gfx1100

2024-01-08 Thread Andrew Stubbs
On 07/01/2024 19:20, Tobias Burnus wrote: ROCm meanwhile supports also some consumer cards; besides the semi-new gfx1030, support for gfx1100 was added more recently (in ROCm 5.7.1 for "Ubuntu 22.04 only" and without parenthesis since ROCm 6.0.0). GCC has already very limited support for

[committed] amdgcn: Match new XNACK defaults in mkoffload

2024-01-08 Thread Andrew Stubbs
This patch fixes build failures with the offload toolchain since my recent XNACK patch. The problem was simply that mkoffload made out-of-date assumptions about the -mxnack defaults. This patch fixes the mismatch. Committed to mainline. Andrewamdgcn: Don't double-count AVGPRs CDNA2 devices

[committed] amdgcn: Don't double-count AVGPRs

2024-01-08 Thread Andrew Stubbs
This patch fixes a runtime error with offload kernels that use a lot of registers, such as libgomp.fortran/target1.f90. Committed to mainline. Andrewamdgcn: Don't double-count AVGPRs CDNA2 devices have VGPRs and AVGPRs combined into a single hardware register file (they're seperate in CDNA1).

Re: [Patch] gcn.h: Add builtin_define ("__gfx1030")

2024-01-08 Thread Andrew Stubbs
On 06/01/2024 21:20, Tobias Burnus wrote: Hi Andrew, I just spotted that this define was missing. OK for mainline? OK. Andrew

[committed] amdgcn: XNACK support

2023-12-13 Thread Andrew Stubbs
Some AMD GCN devices support an "XNACK" mode in which the device can handle page-misses (and maybe other traps in memory instructions), but it's not completely invisible to software. We need this now to support OpenMP Unified Shared Memory (I plan to post updated patches for that in January),

Re: [PATCH v3 1/6] libgomp: basic pinned memory on Linux

2023-12-13 Thread Andrew Stubbs
On 12/12/2023 09:02, Tobias Burnus wrote: On 11.12.23 18:04, Andrew Stubbs wrote: Implement the OpenMP pinned memory trait on Linux hosts using the mlock syscall.  Pinned allocations are performed using mmap, not malloc, to ensure that they can be unpinned safely when freed

Re: [PATCH v3 2/6] libgomp, openmp: Add ompx_pinned_mem_alloc

2023-12-12 Thread Andrew Stubbs
On 12/12/2023 10:05, Tobias Burnus wrote: Hi Andrew, On 11.12.23 18:04, Andrew Stubbs wrote: This creates a new predefined allocator as a shortcut for using pinned memory with OpenMP.  The name uses the OpenMP extension space and is intended to be consistent with other OpenMP implementations

[PATCH v3 6/6] libgomp: fine-grained pinned memory allocator

2023-12-11 Thread Andrew Stubbs
This patch introduces a new custom memory allocator for use with pinned memory (in the case where the Cuda allocator isn't available). In future, this allocator will also be used for Unified Shared Memory. Both memories are incompatible with the system malloc because allocated memory cannot

[PATCH v3 4/6] openmp: -foffload-memory=pinned

2023-12-11 Thread Andrew Stubbs
Implement the -foffload-memory=pinned option such that libgomp is instructed to enable fully-pinned memory at start-up. The option is intended to provide a performance boost to certain offload programs without modifying the code. This feature only works on Linux, at present, and simply calls

[PATCH v3 5/6] libgomp, nvptx: Cuda pinned memory

2023-12-11 Thread Andrew Stubbs
Use Cuda to pin memory, instead of Linux mlock, when available. There are two advantages: firstly, this gives a significant speed boost for NVPTX offloading, and secondly, it side-steps the usual OS ulimit/rlimit setting. The design adds a device independent plugin API for allocating pinned

[PATCH v3 2/6] libgomp, openmp: Add ompx_pinned_mem_alloc

2023-12-11 Thread Andrew Stubbs
This creates a new predefined allocator as a shortcut for using pinned memory with OpenMP. The name uses the OpenMP extension space and is intended to be consistent with other OpenMP implementations currently in development. The allocator is equivalent to using a custom allocator with the

[PATCH v3 3/6] openmp: Add -foffload-memory

2023-12-11 Thread Andrew Stubbs
Add a new option. It's inactive until I add some follow-up patches. gcc/ChangeLog: * common.opt: Add -foffload-memory and its enum values. * coretypes.h (enum offload_memory): New. * doc/invoke.texi: Document -foffload-memory. --- gcc/common.opt | 16

[PATCH v3 0/6] libgomp: OpenMP pinned memory omp_alloc

2023-12-11 Thread Andrew Stubbs
allocator patch have been committed. An older, less compact, version of these patches is already applied to the devel/omp/gcc-13 (OG13) branch. OK for mainline? Andrew Andrew Stubbs (5): libgomp: basic pinned memory on Linux libgomp, openmp: Add ompx_pinned_mem_alloc openmp: Add -foffload-memory

[PATCH v3 1/6] libgomp: basic pinned memory on Linux

2023-12-11 Thread Andrew Stubbs
Implement the OpenMP pinned memory trait on Linux hosts using the mlock syscall. Pinned allocations are performed using mmap, not malloc, to ensure that they can be unpinned safely when freed. This implementation will work OK for page-scale allocations, and finer-grained allocations will be

Re: [PATCH v2 5/6] libgomp, nvptx: Cuda pinned memory

2023-12-07 Thread Andrew Stubbs
@Thomas, there are questions for you below On 22/11/2023 17:07, Tobias Burnus wrote: Note before: Starting with TR11 alias OpenMP 6.0, OpenMP supports handling multiple devices for allocation. It seems as if after using:   my_memspace = omp_get_device_and_host_memspace( 5 ,

[committed v4 3/3] amdgcn, libgomp: low-latency allocator

2023-12-06 Thread Andrew Stubbs
This implements the OpenMP low-latency memory allocator for AMD GCN using the small per-team LDS memory (Local Data Store). Since addresses can now refer to LDS space, the "Global" address space is no-longer compatible. This patch therefore switches the backend to use entirely "Flat" addressing

[committed v4 1/3] libgomp, nvptx: low-latency memory allocator

2023-12-06 Thread Andrew Stubbs
This patch adds support for allocating low-latency ".shared" memory on NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc. The memory can be allocated, reallocated, and freed using a basic but fast algorithm, is thread safe and the size of the low-latency heap can be configured using

[committed v4 2/3] openmp, nvptx: low-lat memory access traits

2023-12-06 Thread Andrew Stubbs
The NVPTX low latency memory is not accessible outside the team that allocates it, and therefore should be unavailable for allocators with the access trait "all". This change means that the omp_low_lat_mem_alloc predefined allocator no longer works (but omp_cgroup_mem_alloc still does).

[committed v4 0/3] libgomp: OpenMP low-latency omp_alloc

2023-12-06 Thread Andrew Stubbs
the patches are the same. The series implements device-specific allocators and adds a low-latency allocator for both GPUs architectures. Andrew Stubbs (3): libgomp, nvptx: low-latency memory allocator openmp, nvptx: low-lat memory access traits amdgcn, libgomp: low-latency allocator gcc/config

Re: [PATCH v3 1/3] libgomp, nvptx: low-latency memory allocator

2023-12-05 Thread Andrew Stubbs
On 04/12/2023 16:04, Tobias Burnus wrote: On 03.12.23 01:32, Andrew Stubbs wrote: This patch adds support for allocating low-latency ".shared" memory on NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc.  The memory can be allocated, reallocated, and freed using a basi

[PATCH v3 3/3] amdgcn, libgomp: low-latency allocator

2023-12-02 Thread Andrew Stubbs
This implements the OpenMP low-latency memory allocator for AMD GCN using the small per-team LDS memory (Local Data Store). Since addresses can now refer to LDS space, the "Global" address space is no-longer compatible. This patch therefore switches the backend to use entirely "Flat" addressing

[PATCH v3 1/3] libgomp, nvptx: low-latency memory allocator

2023-12-02 Thread Andrew Stubbs
This patch adds support for allocating low-latency ".shared" memory on NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc. The memory can be allocated, reallocated, and freed using a basic but fast algorithm, is thread safe and the size of the low-latency heap can be configured using

[PATCH v3 0/3] libgomp: OpenMP low-latency omp_alloc

2023-12-02 Thread Andrew Stubbs
not work because the default traits are incompatible (GPU low-latency memory is not accessible to other teams). I've also included documentation and addressed the comments from Tobias's review. Andrew Andrew Stubbs (3): libgomp, nvptx: low-latency memory allocator openmp, nvptx: low-lat memory

[PATCH v3 2/3] openmp, nvptx: low-lat memory access traits

2023-12-02 Thread Andrew Stubbs
The NVPTX low latency memory is not accessible outside the team that allocates it, and therefore should be unavailable for allocators with the access trait "all". This change means that the omp_low_lat_mem_alloc predefined allocator no longer works (but omp_cgroup_mem_alloc still does).

Re: [PATCH v2 1/6] libgomp: basic pinned memory on Linux

2023-11-29 Thread Andrew Stubbs
On 22/11/2023 14:26, Tobias Burnus wrote: Hi Andrew, Side remark: -#define MEMSPACE_CALLOC(MEMSPACE, SIZE) \ - calloc (1, (((void)(MEMSPACE), (SIZE This fits a bit more to previous patch, but I wonder whether that should use (MEMSPACE, NMEMB, SIZE) instead - to fit to the actual calloc

  1   2   3   4   5   6   7   8   9   10   >