Re: [PATCH v2 1/3] libgomp, nvptx: low-latency memory allocator

2023-11-29 Thread Andrew Stubbs
On 08/09/2023 10:04, Tobias Burnus wrote: Regarding patch 2/3 and MEMSPACE_VALIDATE. In general, I wonder how to handle memory spaces (and traits) that aren't supported. Namely, when to return 0L and when to silently use ignore the trait / use another memory space. The current omp_init_allocat

Re: [PATCH v2 1/6] libgomp: basic pinned memory on Linux

2023-11-29 Thread Andrew Stubbs
On 22/11/2023 14:26, Tobias Burnus wrote: Hi Andrew, Side remark: -#define MEMSPACE_CALLOC(MEMSPACE, SIZE) \ - calloc (1, (((void)(MEMSPACE), (SIZE This fits a bit more to previous patch, but I wonder whether that should use (MEMSPACE, NMEMB, SIZE) instead - to fit to the actual calloc

Re: [PATCH] Generate fused widening multiply-and-accumulate operations only when the widening multiply has single use

2013-10-22 Thread Andrew Stubbs
On 21/10/13 23:01, Yufeng Zhang wrote: Hi, This patch changes the widening_mul pass to fuse the widening multiply with accumulate only when the multiply has single use. The widening_mul pass currently does the conversion regardless of the number of the uses, which can cause poor code-gen in cas

Re: [patch, tree-ssa] PR54295 Incorrect value extension in widening multiply-accumulate

2012-08-17 Thread Andrew Stubbs
On 17/08/12 15:04, Richard Earnshaw wrote: The fix is to make is_widening_mult_p note that it has been called with a WIDEN_MULT_EXPR and rather than decompose the operands again, to simply extract the existing operands, which have already been formulated correctly for a widening multiply operatio

Re: [patch, tree-ssa] PR54295 Incorrect value extension in widening multiply-accumulate

2012-08-17 Thread Andrew Stubbs
On 17/08/12 15:31, Richard Earnshaw wrote: On 17/08/12 15:22, Andrew Stubbs wrote: On 17/08/12 15:04, Richard Earnshaw wrote: The fix is to make is_widening_mult_p note that it has been called with a WIDEN_MULT_EXPR and rather than decompose the operands again, to simply extract the existing

Re: [patch, tree-ssa] PR54295 Incorrect value extension in widening multiply-accumulate

2012-08-17 Thread Andrew Stubbs
On 17/08/12 15:47, Richard Earnshaw wrote: If we don't have a 16x16->64 mult operation then after step 1 we'll still have a MULT_EXPR, not a WIDEN_MULT_EXPR, so when we reach step2 there's nothing to short circuit. Unless, of course, you're expecting us to get step1 -> 16x16->32 widen mult step

Re: [patch, tree-ssa] PR54295 Incorrect value extension in widening multiply-accumulate

2012-08-17 Thread Andrew Stubbs
On 17/08/12 16:20, Richard Earnshaw wrote: No, given a u16xu16->u64 operation in the code, and that the arch doesn't have such an opcode, I'd expect to get step1 -> (u32)u16 x (u32)u16 -> u64 Hmm, I would have thought that would be more costly than (u64)(u16 x u16 -> u32) You might

Re: [patch, tree-ssa] PR54295 Incorrect value extension in widening multiply-accumulate

2012-08-17 Thread Andrew Stubbs
On 17/08/12 18:05, Richard Earnshaw wrote: Take two. This version should address your concerns about handling (u32)u16 * (u32)u16 -> u64 We now look at each operand directly, but when doing so we check whether the operand is the same size as the result or not. When it is, we can strip

[PATCH] Add COMPLEX_VECTOR_INT modes

2023-05-26 Thread Andrew Stubbs
Hi all, I want to implement a vector DIVMOD libfunc for amdgcn, but I can't just do it because the GCC middle-end models DIVMOD's return value as "complex int" type, and there are no vector equivalents of that type. Therefore, this patch adds minimal support for "complex vector int" modes.

Re: [patch] amdgcn: Change -m(no-)xnack to -mxnack=(on,off,any)

2023-05-26 Thread Andrew Stubbs
OK. Andrew On 26/05/2023 15:58, Tobias Burnus wrote: (Update the syntax of the amdgcn commandline option in anticipation of later patches; while -m(no-)xnack is in mainline since r12-2396-gaad32a00b7d2b6 (for PR100208), -mxsnack (contrary to -msram-ecc) is currently mostly a stub for later pat

Re: [PATCH] Add COMPLEX_VECTOR_INT modes

2023-06-05 Thread Andrew Stubbs
On 30/05/2023 07:26, Richard Biener wrote: On Fri, May 26, 2023 at 4:35 PM Andrew Stubbs wrote: Hi all, I want to implement a vector DIVMOD libfunc for amdgcn, but I can't just do it because the GCC middle-end models DIVMOD's return value as "complex int" type, and

[committed] amdgcn: 64-bit not

2022-07-29 Thread Andrew Stubbs
I've committed this patch to enable DImode one's-complement on amdgcn. The hardware doesn't have 64-bit not, and this isn't needed by expand which is happy to use two SImode operations, but the vectorizer isn't so clever. Vector condition masks are DImode on amdgcn, so this has been causing lo

[committed] amdgcn: 64-bit vector shifts

2022-07-29 Thread Andrew Stubbs
I've committed this patch to implement V64DImode vector-vector and vector-scalar shifts. In particular, these are used by the SIMD "inbranch" clones that I'm working on right now, but it's an omission that ought to have been fixed anyway. Andrewamdgcn: 64-bit vector shifts Enable 64-bit vec

[PATCH] openmp-simd-clone: Match shift type

2022-07-29 Thread Andrew Stubbs
This patch adjusts the generation of SIMD "inbranch" clones that use integer masks to ensure that it vectorizes on amdgcn. The problem was only that an amdgcn mask is DImode and the shift amount was SImode, and the difference causes vectorization to fail. OK for mainline? Andrewopenmp-simd-c

Re: [PATCH] openmp-simd-clone: Match shift type

2022-07-29 Thread Andrew Stubbs
On 29/07/2022 16:59, Jakub Jelinek wrote: Doing the fold_convert seems to be a wasted effort to me. Can't this be done conditional on whether some change is needed at all and just using gimple_build_assign with NOP_EXPR, so something like: I'm just not familiar enough with this stuff to run fol

[committed] amdgcn: Vector procedure call ABI

2022-08-09 Thread Andrew Stubbs
I've committed this patch for amdgcn. This changes the procedure calling ABI such that vector arguments are passed in vector registers, rather than on the stack as before. The ABI for scalar functions is the same for arguments, but the return value has now moved to a vector register; keeping

[PATCH 0/3] OpenMP SIMD routines

2022-08-09 Thread Andrew Stubbs
ure has backend support for the clones at this time. OK for mainline (patches 1 & 3)? Thanks Andrew Andrew Stubbs (3): omp-simd-clone: Allow fixed-lane vectors amdgcn: OpenMP SIMD routine support vect: inbranch SIMD clones gcc/config/gcn/gcn.cc | 63 gcc

[PATCH 1/3] omp-simd-clone: Allow fixed-lane vectors

2022-08-09 Thread Andrew Stubbs
The vecsize_int/vecsize_float has an assumption that all arguments will use the same bitsize, and vary the number of lanes according to the element size, but this is inappropriate on targets where the number of lanes is fixed and the bitsize varies (i.e. amdgcn). With this change the vecsize can

[PATCH 2/3] amdgcn: OpenMP SIMD routine support

2022-08-09 Thread Andrew Stubbs
Enable and configure SIMD clones for amdgcn. This affects both the __simd__ function attribute, and the OpenMP "declare simd" directive. Note that the masked SIMD variants are generated, but the middle end doesn't actually support calling them yet. gcc/ChangeLog: * config/gcn/gcn.cc (g

[PATCH 3/3] vect: inbranch SIMD clones

2022-08-09 Thread Andrew Stubbs
There has been support for generating "inbranch" SIMD clones for a long time, but nothing actually uses them (as far as I can see). This patch add supports for a sub-set of possible cases (those using mask_mode == VOIDmode). The other cases fail to vectorize, just as before, so there should be n

Re: [PATCH 1/3] omp-simd-clone: Allow fixed-lane vectors

2022-08-30 Thread Andrew Stubbs
On 26/08/2022 12:04, Jakub Jelinek wrote: gcc/ChangeLog: * doc/tm.texi: Regenerate. * omp-simd-clone.cc (simd_clone_adjust_return_type): Allow zero vecsize. (simd_clone_adjust_argument_types): Likewise. * target.def (compute_vecsize_and_simdlen): Document

Re: [PATCH 2/3] amdgcn: OpenMP SIMD routine support

2022-08-30 Thread Andrew Stubbs
On 09/08/2022 14:23, Andrew Stubbs wrote: Enable and configure SIMD clones for amdgcn. This affects both the __simd__ function attribute, and the OpenMP "declare simd" directive. Note that the masked SIMD variants are generated, but the middle end doesn't actually support c

Re: [PATCH 1/3] omp-simd-clone: Allow fixed-lane vectors

2022-08-31 Thread Andrew Stubbs
On 31/08/2022 09:29, Jakub Jelinek wrote: On Tue, Aug 30, 2022 at 06:54:49PM +0200, Rainer Orth wrote: --- a/gcc/omp-simd-clone.cc +++ b/gcc/omp-simd-clone.cc @@ -504,7 +504,10 @@ simd_clone_adjust_return_type (struct cgraph_node *node) veclen = node->simdclone->vecsize_int; else

Re: [PATCH] testsuite: Fix up vect-simd-clone1[678]*.c tests [PR108898]

2023-03-21 Thread Andrew Stubbs
On 21/03/2023 12:14, Jakub Jelinek wrote: Hi! As mentioned in the PR, vect-simd-clone-1[678]{,f}.c tests FAIL on x86_64-linux with -m64/-march=cascadelake or -m32/-march=cascadelake, there are 3 matches for the calls rather than expected two. As suggested by Richi, this patch changes those tests

Re: [PATCH] amdgcn: Add instruction patterns for vector operations on complex numbers

2023-03-21 Thread Andrew Stubbs
On 21/03/2023 13:35, Andrew Jenner wrote: I have updated this patch to incorporate the feedback from Andrew Stubbs. Tested on CDNA2 GFX90a. gcc/ChangeLog:     * config/gcn/gcn-protos.h (gcn_expand_dpp_swap_pairs_insn)     (gcn_expand_dpp_distribute_even_insn

Re: [PATCH] amdgcn: Add accumulator VGPR registers

2023-03-21 Thread Andrew Stubbs
On 21/03/2023 13:42, Andrew Jenner wrote: This patch gives GCC to use the accumulator VGPR registers on CDNA1 and later architectures. The backend does not yet attempt to make use of the matrix acceleration instructions, but the new registers are still useful as fast space for register spills.

[committed] amdgcn: vec_extract no-op insns

2023-03-23 Thread Andrew Stubbs
This patch adds new pseudo-insns for no-op vector extractions. These were previously modelled as simple move instructions, but the register allocator has unhelpful special handling for these that triggered spills to memory. Modelling them as a vec_select does the right thing in the register al

[committed] amdgcn: Fix register size bug

2023-03-23 Thread Andrew Stubbs
This patch fixes a bug in which the function prologue would save more registers to the stack than there was space allocated. This would cause data corruption when the epilogue restored the registers if a child function had overwritten that memory. The problem was caused by insn constraints tha

Re: [og12] libgomp: Document OpenMP 'pinned' memory (was: [PATCH] libgomp, openmp: pinned memory)

2023-03-27 Thread Andrew Stubbs
On 27/03/2023 12:26, Thomas Schwinge wrote: Hi! On 2023-03-27T09:27:31+, "Stubbs, Andrew" wrote: -Original Message- From: Thomas Schwinge Sent: 24 March 2023 15:50 On 2022-01-04T15:32:17+0000, Andrew Stubbs wrote: This patch implements the OpenMP pinned memory trait

[committed] amdgcn: Add 64-bit vector not

2023-04-04 Thread Andrew Stubbs
I've committed this patch to add a missing vector operator on amdgcn. The architecture doesn't have a 64-bit not instruction so we didn't have an insn for it, but the vectorizer didn't like that and caused the v64df_pow function to use 2MB of stack frame. This is a problem when you typically h

Re: [r13-7135 Regression] FAIL: gcc.dg/vect/vect-simd-clone-18f.c scan-tree-dump-times vect "[\\n\\r] [^\\n]* = foo\\.simdclone" 2 on Linux/x86_64

2023-04-13 Thread Andrew Stubbs
Hi Andre, I don't have a cascadelake device to test on, nor any knowledge about what makes it different from regular x86_64. If the cascadelake device is supposed to work the same as other x86_64 devices for these vectors then the test has found a bug in the compiler and you should be lookin

[committed] amdgcn: HardFP divide

2023-04-18 Thread Andrew Stubbs
This patch switches amdgcn from using softfp for division to using the hardware support. There isn't a single division instruction, but there is an instruction sequence that gives the necessary accuracy. This implementation also allows fully vectorized division, so gives good performance impro

[committed] amdgcn: update target-supports.exp

2023-04-20 Thread Andrew Stubbs
Recent patches have enabled new capabilities on AMD GCN, but not all the testsuite features were enabled. The hardfp divide patch actually had an test regression because the expected results were too conservative. This patch corrects both issues. Andrewamdgcn: update target-supports.exp The b

[committed] amdgcn: bug fix ldexp insn

2023-04-20 Thread Andrew Stubbs
The hardfp division patch exposed a flaw in the ldexp pattern at -O0; the compiler was trying to use out-of-range immediates on VOP3 instruction encodings. This patch changes the constraints appropriately, and also takes the opportunity to combine the two patterns into one using the newly ava

[committed][OG10] amdgcn, openmp: Fix concurrency in low-latency allocator

2023-04-20 Thread Andrew Stubbs
I've committed this to the devel/omp/gcc-12 branch. The patch fixes a concurrency issue where the spin-locks didn't work well if many GPU threads tried to free low-latency memory all at once. Adding a short sleep instruction is enough for the hardware thread to yield and allow another to proc

[OG12][committed] libgomp: Fix USM bugs

2022-12-16 Thread Andrew Stubbs
I've committed this patch to the devel/omp/gcc-12 branch. It fixes some missed cases in the Unified Shared Memory implementation that were especially noticeable in Fortran because the size of arrays are known. This patch will have to be folded into the mainline USM patches that were submitted

Re: [PATCH 3/3] vect: inbranch SIMD clones

2023-01-06 Thread Andrew Stubbs
Here's a new version of the patch. On 01/12/2022 14:16, Jakub Jelinek wrote: +void __attribute__((noinline)) You should use noipa attribute instead of noinline on callers which aren't declare simd (on declare simd it would prevent cloning which is essential for the declare simd behavior), so t

[OG12][committed] amdgcn, libgomp: custom USM allocator

2023-01-11 Thread Andrew Stubbs
This patch fixes a runtime issue I encountered with the AMD GCN Unified Shared Memory implementation. We were using regular malloc'd memory configured into USM mode, but there were random intermittent crashes. I can't be completely sure, but my best guess is that the HSA driver is using malloc

Re: [OG12][committed] amdgcn, libgomp: custom USM allocator

2023-01-13 Thread Andrew Stubbs
I changed it to use 128-byte alignment to match the GPU cache-lines. Committed to OG12. Andrew On 11/01/2023 18:05, Andrew Stubbs wrote: This patch fixes a runtime issue I encountered with the AMD GCN Unified Shared Memory implementation. We were using regular malloc'd memory confi

Re: [PATCH, OpenMP 5.0] More implementation of the requires directive

2022-03-29 Thread Andrew Stubbs
On 13/01/2021 15:07, Chung-Lin Tang wrote: We currently emit errors, but do not fatally cause exit of the program if those are not met. We're still unsure if complete block-out of program execution is the right thing for the user. This can be discussed later. After the Unified Shared Memory p

Re: [PATCH 5/5] openmp: -foffload-memory=pinned

2022-03-30 Thread Andrew Stubbs
On 08/03/2022 11:30, Hafiz Abid Qadeer wrote: gcc/ChangeLog: * omp-low.cc (omp_enable_pinned_mode): New function. (execute_lower_omp): Call omp_enable_pinned_mode. This worked for x86_64, but I needed to make the attached adjustment to work on powerpc without a linker error.

Re: [PATCH 4/5] openmp: Use libgomp memory allocation functions with unified shared memory.

2022-04-02 Thread Andrew Stubbs
On 08/03/2022 11:30, Hafiz Abid Qadeer wrote: This patches changes calls to malloc/free/calloc/realloc and operator new to memory allocation functions in libgomp with allocator=ompx_unified_shared_mem_alloc. This additional patch adds transformation for omp_target_alloc. The OpenMP 5.0 documen

Re: [PATCH 4/5] openmp: Use libgomp memory allocation functions with unified shared memory.

2022-04-02 Thread Andrew Stubbs
On 02/04/2022 13:04, Andrew Stubbs wrote: This additional patch adds transformation for omp_target_alloc. The OpenMP 5.0 document says that addresses allocated this way needs to work without is_device_ptr. The easiest way to make that work is to make them USM addresses. Actually, reading on

Re: [PATCH 0/5] openmp: Handle pinned and unified shared memory.

2022-04-13 Thread Andrew Stubbs
This patch adjusts the testcases, previously proposed, to allow for testing on machines with varying page sizes and default amounts of lockable memory. There turns out to be more variation than I had thought. This should go on mainline at the same time as the previous patches in this thread.

[PATCH] openmp: Handle unified address memory.

2022-04-20 Thread Andrew Stubbs
This patch adds enough support for "requires unified_address" to make the sollve_vv testcases pass. It implements unified_address as a synonym of unified_shared_memory, which is both valid and the only way I know of to unify addresses with Cuda (could be wrong). This patch should be applied on

Re: libgomp GCN plugin: Clean up unused references to system-provided HSA Runtime library (was: [PATCH 1/4] Remove build dependence on HSA run-time)

2022-04-28 Thread Andrew Stubbs
On 06/04/2022 11:02, Thomas Schwinge wrote: Hi! On 2021-01-14T15:50:23+0100, I wrote: I'm raising here an issue with HSA libgomp plugin code changes from a while ago. While HSA is now no longer relevant for GCC master branch, the same code has also been copied into the GCN libgomp plugin. He

[committed] amdgcn: Fix addsub bug

2023-04-27 Thread Andrew Stubbs
I've committed this patch to fix a couple of bugs introduced in the recent CMul patch. First, the fmsubadd insn was accidentally all adds and no substracts. Second, there were input dependencies on the undefined output register which caused the compiler to reserve unnecessary slots in the stac

Re: [Patch] GCN: Silence unused-variable warning

2023-05-05 Thread Andrew Stubbs
On 05/05/2023 12:10, Tobias Burnus wrote: Probably added for symmetry with out_mode/out_n but at the end not used. That function was added in commit   r13-6423-gce9cd7258d0 amdgcn: Enable SIMD vectorization of math functions Tested the removal by building with that patch applied. OK for mainl

Re: [PATCH] amdgcn: Fix instruction generation for exp2 and log2 operations

2022-11-03 Thread Andrew Stubbs
On 03/11/2022 17:47, Kwok Cheung Yeung wrote: Hello This patch fixes a bug introduced in a previous patch adding support for generating native instructions for the exp2 and log2 patterns. The problem is that the name of the instruction implementing the exp2 operation is v_exp (and not v_exp2)

Re: [PATCH] amdgcn: Add builtins for vectorized native versions of abs, floorf and floor

2022-11-08 Thread Andrew Stubbs
On 08/11/2022 14:35, Kwok Cheung Yeung wrote: Hello This patch adds three extra builtins for the vectorized forms of the abs, floorf and floor math functions, which are implemented by native GCN instructions. I have also added a test to check that they generate the expected assembler instruct

Re: [patch] gcn: Add __builtin_gcn_kernarg_ptr

2022-11-16 Thread Andrew Stubbs
On 16/11/2022 11:42, Tobias Burnus wrote: This is a part of a patch by Andrew (hi!) - namely that part that only adds the __builtin_gcn_kernarg_ptr. More is planned, see below. The short term benefit of this patch is to permit replacing hardcoded numbers by a builtin – like in libgomp (see pa

Re: [PATCH] amdgcn: Add instruction patterns for vector operations on complex numbers

2023-02-14 Thread Andrew Stubbs
On 09/02/2023 20:13, Andrew Jenner wrote: This patch introduces instruction patterns for complex number operations in the GCN machine description. These patterns are cmul, cmul_conj, vec_addsub, vec_fmaddsub, vec_fmsubadd, cadd90, cadd270, cmla and cmls (cmla_conj and cmls_conj were not found t

Re: [og12] In 'libgomp/allocator.c:omp_realloc', route 'free' through 'MEMSPACE_FREE' (was: [PATCH] libgomp, OpenMP, nvptx: Low-latency memory allocator)

2023-02-14 Thread Andrew Stubbs
On 14/02/2023 12:54, Thomas Schwinge wrote: Hi Andrew! On 2022-01-13T11:13:51+, Andrew Stubbs wrote: Updated patch: this version fixes some missed cases of malloc in the realloc implementation. Right, and as it seems I've run into another issue: a stray 'free'.

[OG12][committed] amdgcn: OpenMP low-latency allocator

2023-02-16 Thread Andrew Stubbs
These patches implement an LDS memory allocator for OpenMP on AMD. 1. 230216-basic-allocator.patch Separate the allocator from NVPTX so the code can be shared. 2. 230216-amd-low-lat.patch Allocate the memory, adjust the default address space, and hook up the allocator. They will need to be

Re: [og12] Attempt to register OpenMP pinned memory using a device instead of 'mlock' (was: [PATCH] libgomp, openmp: pinned memory)

2023-02-20 Thread Andrew Stubbs
On 17/02/2023 08:12, Thomas Schwinge wrote: Hi Andrew! On 2023-02-16T23:06:44+0100, I wrote: On 2023-02-16T16:17:32+, "Stubbs, Andrew via Gcc-patches" wrote: The mmap implementation was not optimized for a lot of small allocations, and I can't see that issue changing here That's corre

Re: [og12] Un-break nvptx libgomp build (was: [OG12][committed] amdgcn: OpenMP low-latency allocator)

2023-02-20 Thread Andrew Stubbs
On 16/02/2023 21:11, Thomas Schwinge wrote: --- /dev/null +++ b/libgomp/basic-allocator.c +#ifndef BASIC_ALLOC_YIELD +#deine BASIC_ALLOC_YIELD +#endif In file included from [...]/libgomp/config/nvptx/allocator.c:49: [...]/libgomp/config/nvptx/../../basic-allocator.c:52:2: error: in

Re: [PATCH 3/3] vect: inbranch SIMD clones

2023-02-23 Thread Andrew Stubbs
On 10/02/2023 09:11, Jakub Jelinek wrote: I've tried to fix the -flto thing and I can't figure out how. The problem seems to be that there are two dump files from the two compiler invocations and it scans the wrong one. Aarch64 has the same problem. Two dumps are because it is in a dg-do run te

[committed][OG12] libgomp: no need to attach USM pointers

2023-02-23 Thread Andrew Stubbs
This patch fixes a bug in which libgomp doesn't know what to do with attached pointers in fortran derived types when using Unified Shared Memory instead of explicit mappings. I've committed it to the devel/omp/gcc-12 branch (OG12) and will fold it into the next rebase/repost of the USM patches

Re: [PATCH] amdgcn: Enable SIMD vectorization of math functions

2023-03-01 Thread Andrew Stubbs
On 28/02/2023 23:01, Kwok Cheung Yeung wrote: Hello This patch implements the TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION target hook for the AMD GCN architecture, such that when vectorized, calls to builtin standard math functions such as asinf, exp, pow etc. are converted to calls to the r

Re: [PATCH] amdgcn: Enable SIMD vectorization of math functions

2023-03-01 Thread Andrew Stubbs
On 01/03/2023 10:52, Andre Vieira (lists) wrote: On 01/03/2023 10:01, Andrew Stubbs wrote: > On 28/02/2023 23:01, Kwok Cheung Yeung wrote: >> Hello >> >> This patch implements the TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION >> target hook for the AMD GCN a

Re: [PATCH] amdgcn: Add instruction patterns for conditional min/max operations

2023-03-02 Thread Andrew Stubbs
On 01/03/2023 16:56, Paul-Antoine Arras wrote: This patch introduces instruction patterns for conditional min and max operations (cond_{f|s|u}{max|min}) in the GCN machine description. It also allows the exec register to be saved in SGPRs to avoid spilling to memory. Tested on GCN3 Fiji gfx803

Re: [PATCH] amdgcn: Enable SIMD vectorization of math functions

2023-03-02 Thread Andrew Stubbs
On 02/03/2023 15:07, Kwok Cheung Yeung wrote: Hello I've made the suggested changes. Should I hold off on committing this until GCC 13 has been branched off? No need, amdgcn is not a primary target and this stuff won't affect anyone else. Please go ahead and commit. Andrew

Re: [PATCH] amdgcn: Add instruction patterns for conditional min/max operations

2023-03-06 Thread Andrew Stubbs
On 03/03/2023 17:05, Paul-Antoine Arras wrote: Le 02/03/2023 à 18:18, Andrew Stubbs a écrit : On 01/03/2023 16:56, Paul-Antoine Arras wrote: This patch introduces instruction patterns for conditional min and max operations (cond_{f|s|u}{max|min}) in the GCN machine description. It also allows

Re: [Patch] GCN update for wwwdocs / libgomp.texi

2023-03-08 Thread Andrew Stubbs
On 08/03/2023 11:06, Tobias Burnus wrote: Next try – this time with both patches. On 08.03.23 12:05, Tobias Burnus wrote: Hi Andrew, attached are two patches related to GCN, one for libgomp.texi documenting an env var and a release-notes update in www docs. OK? Comments? LGTM Andrew

Re: [Patch] gcn/mkoffload.cc: Pass -save-temps on for the hsaco step

2023-03-13 Thread Andrew Stubbs
On 13/03/2023 12:25, Tobias Burnus wrote: Found when comparing '-v -Wl,-v' output as despite -save-temps multiple runs yielded differed results. Fixed as attached. OK for mainline? OK. Andrew

[committed 0/6] amdgcn: Add V32, V16, V8, V4, and V2 vectors

2022-10-11 Thread Andrew Stubbs
ssues, but rather existing problems that did not show up because the code did not previously vectorize. Expanding the testcase to allow 64-lane vectors shows the same problems there. I shall backport these patches to the OG12 branch shortly. Andrew Andrew Stubbs (6): amdgcn: add multiple ve

[committed 2/6] amdgcn: Resolve insn conditions at compile time

2022-10-11 Thread Andrew Stubbs
GET_MODE_NUNITS isn't a compile time constant, so we end up with many impossible insns in the machine description. Adding MODE_VF allows the insns to be eliminated completely. gcc/ChangeLog: * config/gcn/gcn-valu.md (2): Use MODE_VF. (2): Likewise. * config/gcn/g

[committed 3/6] amdgcn: Add vec_extract for partial vectors

2022-10-11 Thread Andrew Stubbs
Add vec_extract expanders for all valid pairs of vector types. gcc/ChangeLog: * config/gcn/gcn-protos.h (get_exec): Add prototypes for two variants. * config/gcn/gcn-valu.md (vec_extract): New define_expand. * config/gcn/gcn.cc (get_exec): Export the existing func

[committed 4/6] amdgcn: vec_init for multiple vector sizes

2022-10-11 Thread Andrew Stubbs
Implements vec_init when the input is a vector of smaller vectors, or of vector MEM types, or a smaller vector duplicated several times. gcc/ChangeLog: * config/gcn/gcn-valu.md (vec_init): New. * config/gcn/gcn.cc (GEN_VN): Add andvNsi3, subvNsi3. (GEN_VNM): Add gathervNm

[committed 1/6] amdgcn: add multiple vector sizes

2022-10-11 Thread Andrew Stubbs
The vectors sizes are simulated using implicit masking, but they make life easier for the autovectorizer and SLP passes. gcc/ChangeLog: * config/gcn/gcn-modes.def (VECTOR_MODE): Add new modes V32QI, V32HI, V32SI, V32DI, V32TI, V32HF, V32SF, V32DF, V16QI, V16HI, V16SI, V16

[committed 5/6] amdgcn: Add vector integer negate insn

2022-10-11 Thread Andrew Stubbs
Another example of the vectorizer needing explicit insns where the scalar expander just works. gcc/ChangeLog: * config/gcn/gcn-valu.md (neg2): New define_expand. --- gcc/config/gcn/gcn-valu.md | 13 + 1 file changed, 13 insertions(+) diff --git a/gcc/config/gcn/gcn-valu.md

[committed 6/6] amdgcn: vector testsuite tweaks

2022-10-11 Thread Andrew Stubbs
The testsuite needs a few tweaks following my patches to add multiple vector sizes for amdgcn. gcc/testsuite/ChangeLog: * gcc.dg/pr104464.c: Xfail on amdgcn. * gcc.dg/signbit-2.c: Likewise. * gcc.dg/signbit-5.c: Likewise. * gcc.dg/vect/bb-slp-68.c: Likewise.

Re: [committed 0/6] amdgcn: Add V32, V16, V8, V4, and V2 vectors

2022-10-11 Thread Andrew Stubbs
On 11/10/2022 12:29, Richard Biener wrote: On Tue, Oct 11, 2022 at 1:03 PM Andrew Stubbs wrote: This patch series adds additional vector sizes for the amdgcn backend. The hardware supports any arbitrary vector length up to 64-lanes via masking, but GCC cannot (yet) make full use of them due

Re: [Patch] libgomp/gcn: Prepare for reverse-offload callback handling

2022-10-12 Thread Andrew Stubbs
On 12/10/2022 15:29, Tobias Burnus wrote: On 29.09.22 18:24, Andrew Stubbs wrote: On 27/09/2022 14:16, Tobias Burnus wrote: Andrew did suggest a while back to piggyback on the console_output handling, avoiding another atomic access. - If this is still wanted, I like to have some guidance

Re: [PATCH][RFT] Vectorization of first-order recurrences

2022-10-14 Thread Andrew Stubbs
On 14/10/2022 08:07, Richard Biener wrote: On Tue, 11 Oct 2022, Richard Sandiford wrote: Richard Biener writes: On Mon, 10 Oct 2022, Andrew Stubbs wrote: On 10/10/2022 12:03, Richard Biener wrote: The following picks up the prototype by Ju-Zhe Zhong for vectorizing first order recurrences

[PATCH] libgomp: fix hang on fatal error

2022-10-14 Thread Andrew Stubbs
This patch fixes a problem in which fatal errors inside mutex-locked regions (i.e. basically anything in the plugin) will cause it to hang up trying to take the lock to clean everything up. Using abort() instead of exit(1) bypasses the atexit handlers and solves the problem. OK for mainline?

[OG12 commit] amdgcn, libgomp: USM allocation update

2022-10-24 Thread Andrew Stubbs
I've committed this patch to the devel/omp/gcc-12 branch. I will have to fold it into my previous OpenMP memory management patch series when I repost it. The patch changes the internal memory allocation method such that memory is allocated in the regular heap and then marked as "coarse-grained

[OG12 commit] amdgcn: disallow USM on gfx908

2022-10-24 Thread Andrew Stubbs
I've committed this patch to the devel/omp/gcc-12 branch. I will have to fold it into my previous OpenMP memory management patch series when I repost it. The GFX908 (MI100) devices only partially support the Unified Shared Memory model that we have, and only then with additional kernel boot p

[OG12 commit] vect: WORKAROUND vectorizer bug

2022-10-24 Thread Andrew Stubbs
I've committed this to the OG12 branch to remove some test failures. We probably ought to have something on mainline also, but a proper fix would be better. Without this. the libgomp.oacc-c-c++-common/private-variables.c testcase fails to compile due to an ICE. The OpenACC worker broadcasting

Re: [OG12 commit] vect: WORKAROUND vectorizer bug

2022-10-27 Thread Andrew Stubbs
On 24/10/2022 19:06, Richard Biener wrote: Am 24.10.2022 um 18:51 schrieb Andrew Stubbs : I've committed this to the OG12 branch to remove some test failures. We probably ought to have something on mainline also, but a proper fix would be better. Without this. the libgomp.oac

[committed] amdgcn: Silence unused parameter warning

2022-10-31 Thread Andrew Stubbs
A function parameter was left over from a previous draft of my multiple-vector-length patch. This patch silences the harmless warning. Andrewamdgcn: Silence unused parameter warning gcc/ChangeLog: * config/gcn/gcn.cc (gcn_simd_clone_compute_vecsize_and_simdlen): Set base_type a

[committed] amdgcn: add fmin/fmax patterns

2022-10-31 Thread Andrew Stubbs
This patch adds patterns for the fmin and fmax operators, for scalars, vectors, and vector reductions. The compiler uses smin and smax for most floating-point optimizations, etc., but not where the user calls fmin/fmax explicitly. On amdgcn the hardware min/max instructions are already IEEE c

[committed] amdgcn: multi-size vector reductions

2022-10-31 Thread Andrew Stubbs
My recent patch to add additional vector lengths didn't address the vector reductions yet. This patch adds the missing support. Shorter vectors use fewer reduction steps, and the means to extract the final value has been adjusted. Lacking from this is any useful costs, so for loops the vect p

Re: [Patch] gcn/t-omp-device: Add 'amdgcn' as 'arch' [PR105602]

2022-05-16 Thread Andrew Stubbs
On 16/05/2022 11:28, Tobias Burnus wrote: While 'vendor' and 'kind' is well defined, 'arch' and 'isa' isn't. When looking at an 'metadirective' testcase (which oddly uses 'arch(amd)'), I noticed that LLVM uses 'arch(amdgcn)' while we use 'gcn', cf. e.g. 'clang/lib/Headers/openmp_wrappers/math.h'

Re: [PATCH, OpenMP, v2] Implement uses_allocators clause for target regions

2022-05-19 Thread Andrew Stubbs
On 19/05/2022 17:00, Jakub Jelinek wrote: Without requires dynamic_allocators, there are various extra restrictions imposed: 1) omp_init_allocator/omp_destroy_allocator may not be called (except for implicit calls to it from uses_allocators) in a target region I interpreted that more like "

[committed] amdgcn: Remove LLVM 9 assembler/linker support

2022-05-24 Thread Andrew Stubbs
I've committed this patch to set the minimum required LLVM version, for the assembler and linker, to 13.0.1. An upgrade from LLVM 9 is a prerequisite for the gfx90a support, and 13.0.1 is now the oldest version not known to have compatibility issues. The patch removes all the obsolete feature

[committed] amdgcn: Add gfx90a support

2022-05-24 Thread Andrew Stubbs
I've committed this patch to add support for gfx90a AMD GPU devices. The patch updates all the places that have architecture/ISA specific code, tidies up the ISA naming and handling in the backend, and adds a new multilib. This is just lightly tested at this point, but there are no known issu

Re: [patch] [wwwdocs]+[invoke.texi] Update GCN for gfx90a (was: Re: [committed] amdgcn: Add gfx90a support)

2022-05-25 Thread Andrew Stubbs
On 24/05/2022 17:44, Tobias Burnus wrote: On 24.05.22 17:31, Andrew Stubbs wrote: amdgcn: Add gfx90a support Attached is an attempt to update invoke.texi I've deliberately avoided the MI100 and MI200 names because they're really not that simple. MI100 is gfx908, but MI150 is

Re: [patch] [wwwdocs]+[invoke.texi] Update GCN for gfx90a (was: Re: [committed] amdgcn: Add gfx90a support)

2022-05-25 Thread Andrew Stubbs
On 25/05/2022 12:16, Tobias Burnus wrote: On 25.05.22 11:18, Andrew Stubbs wrote: On 24/05/2022 17:44, Tobias Burnus wrote: On 24.05.22 17:31, Andrew Stubbs wrote: amdgcn: Add gfx90a support I've deliberately avoided the MI100 and MI200 names because they're really not that simple

Re: [committed] amdgcn: Remove LLVM 9 assembler/linker support

2022-06-06 Thread Andrew Stubbs
On 27/05/2022 20:16, Thomas Schwinge wrote: Hi Andrew! On 2022-05-24T16:27:52+0100, Andrew Stubbs wrote: I've committed this patch to set the minimum required LLVM version, for the assembler and linker, to 13.0.1. An upgrade from LLVM 9 is a prerequisite for the gfx90a support, and 13.0

Re: [PATCH] libgomp, openmp: pinned memory

2022-06-07 Thread Andrew Stubbs
x27;t know how that'll handle heterogenous systems, but those ought to be rare. I don't think libmemkind will resolve this performance issue, although certainly it can be used for host implementations of low-latency memories, etc. Andrew On 13/01/2022 13:53, Andrew Stubbs wrote:

Re: [PATCH] libgomp, openmp: pinned memory

2022-06-07 Thread Andrew Stubbs
On 07/06/2022 13:10, Jakub Jelinek wrote: On Tue, Jun 07, 2022 at 12:05:40PM +0100, Andrew Stubbs wrote: Following some feedback from users of the OG11 branch I think I need to withdraw this patch, for now. The memory pinned via the mlock call does not give the expected performance boost. I

[OG11, committed] libgomp amdgcn: Fix issues with dynamic OpenMP thread scaling

2021-08-04 Thread Andrew Stubbs
This patch fixes a bug in which testcases using thread_limit larger than the number of physical threads would crash with a memory fault. This was exacerbated in testcases with a lot of register pressure because the autoscaling reduces the number of physical threads to compensate for the increas

Re: [wwwdocs] gcc-12/changes.html (GCN): >1 workers per gang

2022-02-02 Thread Andrew Stubbs
On 02/02/2022 15:39, Tobias Burnus wrote: On 09.08.21 15:55, Tobias Burnus wrote: Now that the GCN/OpenACC patches for this have been committed today, I think it makes sense to add it to the documentation. (I was told that some follow-up items are still pending, but as the feature does work ...)

[committed] amdgcn: Allow vector reductions on constants

2022-02-14 Thread Andrew Stubbs
I've committed this fix for an ICE compiling sollve_vv testcase test_target_teams_distribute_defaultmap.c. Somehow the optimizers result in a vector reduction on a vector of duplicated constants. This was a case the backend didn't handle, so we ended up with an unrecognised instruction ICE.

[OG11][committed] amdgcn: Allow vector reductions on constants

2022-02-14 Thread Andrew Stubbs
On 14/02/2022 14:13, Andrew Stubbs wrote: I've committed this fix for an ICE compiling sollve_vv testcase test_target_teams_distribute_defaultmap.c. Somehow the optimizers result in a vector reduction on a vector of duplicated constants. This was a case the backend didn't handle, so

Re: [Patch] GCN: Implement __atomic_compare_exchange_{1, 2} in libgcc [PR102215]

2022-03-09 Thread Andrew Stubbs
On 09/03/2022 16:29, Tobias Burnus wrote: This shows up with with OpenMP offloading as libgomp since a couple of months uses __atomic_compare_exchange (see PR for details), causing link errors when the gcn libgomp.a is linked. It also shows up with sollve_vv. The implementation does a bit copy'n

[PATCH] OpenMP: Ensure that offloaded variables are public

2021-11-16 Thread Andrew Stubbs
Hi, This patch is needed for AMD GCN offloading when we use the assembler from LLVM 13+. The GCN runtime (libgomp+ROCm) requires that the location of all variables in the offloaded variables table are discoverable at runtime (using the "hsa_executable_symbol_get_info" API), and this only wor

Re: Host and offload targets have no common meaning of address spaces (was: [ping] Re-unify 'omp_build_component_ref' and 'oacc_build_component_ref')

2021-09-03 Thread Andrew Stubbs
On 24/08/2021 12:43, Richard Biener via Gcc-patches wrote: On Tue, Aug 24, 2021 at 12:23 PM Thomas Schwinge wrote: Hi! On 2021-08-19T22:13:56+0200, I wrote: On 2021-08-16T10:21:04+0200, Jakub Jelinek wrote: On Mon, Aug 16, 2021 at 10:08:42AM +0200, Thomas Schwinge wrote: |> Concerning the

[committed] amdgcn: Support LLVM 13 assembler syntax

2021-10-07 Thread Andrew Stubbs
I've committed this patch to allow GCC to adapt to the different variants of the LLVM amdgcn assembler. Unfortunately they keep making changes without maintaining backwards compatibility. GCC should now work with LLVM 9, LLVM 12, and LLVM 13 in terms of CLI usage, however only LLVM 9 is well t

<    4   5   6   7   8   9   10   >