On 08/09/2023 10:04, Tobias Burnus wrote:
Regarding patch 2/3 and MEMSPACE_VALIDATE.
In general, I wonder how to handle memory spaces (and traits) that
aren't supported. Namely, when to return 0L and when to silently use
ignore the trait / use another memory space.
The current omp_init_allocat
On 22/11/2023 14:26, Tobias Burnus wrote:
Hi Andrew,
Side remark:
-#define MEMSPACE_CALLOC(MEMSPACE, SIZE) \ - calloc (1,
(((void)(MEMSPACE), (SIZE
This fits a bit more to previous patch, but I wonder whether that should
use (MEMSPACE, NMEMB, SIZE) instead - to fit to the actual calloc
On 21/10/13 23:01, Yufeng Zhang wrote:
Hi,
This patch changes the widening_mul pass to fuse the widening multiply
with accumulate only when the multiply has single use. The widening_mul
pass currently does the conversion regardless of the number of the uses,
which can cause poor code-gen in cas
On 17/08/12 15:04, Richard Earnshaw wrote:
The fix is to make is_widening_mult_p note that it has been called with
a WIDEN_MULT_EXPR and rather than decompose the operands again, to
simply extract the existing operands, which have already been formulated
correctly for a widening multiply operatio
On 17/08/12 15:31, Richard Earnshaw wrote:
On 17/08/12 15:22, Andrew Stubbs wrote:
On 17/08/12 15:04, Richard Earnshaw wrote:
The fix is to make is_widening_mult_p note that it has been called with
a WIDEN_MULT_EXPR and rather than decompose the operands again, to
simply extract the existing
On 17/08/12 15:47, Richard Earnshaw wrote:
If we don't have a 16x16->64 mult operation then after step 1 we'll
still have a MULT_EXPR, not a WIDEN_MULT_EXPR, so when we reach step2
there's nothing to short circuit.
Unless, of course, you're expecting us to get
step1 -> 16x16->32 widen mult
step
On 17/08/12 16:20, Richard Earnshaw wrote:
No, given a u16xu16->u64 operation in the code, and that the arch
doesn't have such an opcode, I'd expect to get
step1 -> (u32)u16 x (u32)u16 -> u64
Hmm, I would have thought that would be more costly than
(u64)(u16 x u16 -> u32)
You might
On 17/08/12 18:05, Richard Earnshaw wrote:
Take two.
This version should address your concerns about handling
(u32)u16 * (u32)u16 -> u64
We now look at each operand directly, but when doing so we check whether
the operand is the same size as the result or not. When it is, we can
strip
Hi all,
I want to implement a vector DIVMOD libfunc for amdgcn, but I can't just
do it because the GCC middle-end models DIVMOD's return value as
"complex int" type, and there are no vector equivalents of that type.
Therefore, this patch adds minimal support for "complex vector int"
modes.
OK.
Andrew
On 26/05/2023 15:58, Tobias Burnus wrote:
(Update the syntax of the amdgcn commandline option in anticipation of
later patches;
while -m(no-)xnack is in mainline since r12-2396-gaad32a00b7d2b6 (for
PR100208),
-mxsnack (contrary to -msram-ecc) is currently mostly a stub for later
pat
On 30/05/2023 07:26, Richard Biener wrote:
On Fri, May 26, 2023 at 4:35 PM Andrew Stubbs wrote:
Hi all,
I want to implement a vector DIVMOD libfunc for amdgcn, but I can't just
do it because the GCC middle-end models DIVMOD's return value as
"complex int" type, and
I've committed this patch to enable DImode one's-complement on amdgcn.
The hardware doesn't have 64-bit not, and this isn't needed by expand
which is happy to use two SImode operations, but the vectorizer isn't so
clever. Vector condition masks are DImode on amdgcn, so this has been
causing lo
I've committed this patch to implement V64DImode vector-vector and
vector-scalar shifts.
In particular, these are used by the SIMD "inbranch" clones that I'm
working on right now, but it's an omission that ought to have been fixed
anyway.
Andrewamdgcn: 64-bit vector shifts
Enable 64-bit vec
This patch adjusts the generation of SIMD "inbranch" clones that use
integer masks to ensure that it vectorizes on amdgcn.
The problem was only that an amdgcn mask is DImode and the shift amount
was SImode, and the difference causes vectorization to fail.
OK for mainline?
Andrewopenmp-simd-c
On 29/07/2022 16:59, Jakub Jelinek wrote:
Doing the fold_convert seems to be a wasted effort to me.
Can't this be done conditional on whether some change is needed at all
and just using gimple_build_assign with NOP_EXPR, so something like:
I'm just not familiar enough with this stuff to run fol
I've committed this patch for amdgcn.
This changes the procedure calling ABI such that vector arguments are
passed in vector registers, rather than on the stack as before.
The ABI for scalar functions is the same for arguments, but the return
value has now moved to a vector register; keeping
ure has
backend support for the clones at this time.
OK for mainline (patches 1 & 3)?
Thanks
Andrew
Andrew Stubbs (3):
omp-simd-clone: Allow fixed-lane vectors
amdgcn: OpenMP SIMD routine support
vect: inbranch SIMD clones
gcc/config/gcn/gcn.cc | 63
gcc
The vecsize_int/vecsize_float has an assumption that all arguments will use
the same bitsize, and vary the number of lanes according to the element size,
but this is inappropriate on targets where the number of lanes is fixed and
the bitsize varies (i.e. amdgcn).
With this change the vecsize can
Enable and configure SIMD clones for amdgcn. This affects both the __simd__
function attribute, and the OpenMP "declare simd" directive.
Note that the masked SIMD variants are generated, but the middle end doesn't
actually support calling them yet.
gcc/ChangeLog:
* config/gcn/gcn.cc (g
There has been support for generating "inbranch" SIMD clones for a long time,
but nothing actually uses them (as far as I can see).
This patch add supports for a sub-set of possible cases (those using
mask_mode == VOIDmode). The other cases fail to vectorize, just as before,
so there should be n
On 26/08/2022 12:04, Jakub Jelinek wrote:
gcc/ChangeLog:
* doc/tm.texi: Regenerate.
* omp-simd-clone.cc (simd_clone_adjust_return_type): Allow zero
vecsize.
(simd_clone_adjust_argument_types): Likewise.
* target.def (compute_vecsize_and_simdlen): Document
On 09/08/2022 14:23, Andrew Stubbs wrote:
Enable and configure SIMD clones for amdgcn. This affects both the __simd__
function attribute, and the OpenMP "declare simd" directive.
Note that the masked SIMD variants are generated, but the middle end doesn't
actually support c
On 31/08/2022 09:29, Jakub Jelinek wrote:
On Tue, Aug 30, 2022 at 06:54:49PM +0200, Rainer Orth wrote:
--- a/gcc/omp-simd-clone.cc
+++ b/gcc/omp-simd-clone.cc
@@ -504,7 +504,10 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
veclen = node->simdclone->vecsize_int;
else
On 21/03/2023 12:14, Jakub Jelinek wrote:
Hi!
As mentioned in the PR, vect-simd-clone-1[678]{,f}.c tests FAIL on
x86_64-linux with -m64/-march=cascadelake or -m32/-march=cascadelake,
there are 3 matches for the calls rather than expected two.
As suggested by Richi, this patch changes those tests
On 21/03/2023 13:35, Andrew Jenner wrote:
I have updated this patch to incorporate the feedback from Andrew
Stubbs. Tested on CDNA2 GFX90a.
gcc/ChangeLog:
* config/gcn/gcn-protos.h (gcn_expand_dpp_swap_pairs_insn)
(gcn_expand_dpp_distribute_even_insn
On 21/03/2023 13:42, Andrew Jenner wrote:
This patch gives GCC to use the accumulator VGPR registers on CDNA1 and
later architectures. The backend does not yet attempt to make use of the
matrix acceleration instructions, but the new registers are still useful
as fast space for register spills.
This patch adds new pseudo-insns for no-op vector extractions.
These were previously modelled as simple move instructions, but the
register allocator has unhelpful special handling for these that
triggered spills to memory. Modelling them as a vec_select does the
right thing in the register al
This patch fixes a bug in which the function prologue would save more
registers to the stack than there was space allocated. This would cause
data corruption when the epilogue restored the registers if a child
function had overwritten that memory.
The problem was caused by insn constraints tha
On 27/03/2023 12:26, Thomas Schwinge wrote:
Hi!
On 2023-03-27T09:27:31+, "Stubbs, Andrew" wrote:
-Original Message-
From: Thomas Schwinge
Sent: 24 March 2023 15:50
On 2022-01-04T15:32:17+0000, Andrew Stubbs
wrote:
This patch implements the OpenMP pinned memory trait
I've committed this patch to add a missing vector operator on amdgcn.
The architecture doesn't have a 64-bit not instruction so we didn't have
an insn for it, but the vectorizer didn't like that and caused the
v64df_pow function to use 2MB of stack frame. This is a problem when you
typically h
Hi Andre,
I don't have a cascadelake device to test on, nor any knowledge about
what makes it different from regular x86_64.
If the cascadelake device is supposed to work the same as other x86_64
devices for these vectors then the test has found a bug in the compiler
and you should be lookin
This patch switches amdgcn from using softfp for division to using the
hardware support. There isn't a single division instruction, but there
is an instruction sequence that gives the necessary accuracy.
This implementation also allows fully vectorized division, so gives good
performance impro
Recent patches have enabled new capabilities on AMD GCN, but not all the
testsuite features were enabled. The hardfp divide patch actually had an
test regression because the expected results were too conservative.
This patch corrects both issues.
Andrewamdgcn: update target-supports.exp
The b
The hardfp division patch exposed a flaw in the ldexp pattern at -O0;
the compiler was trying to use out-of-range immediates on VOP3
instruction encodings.
This patch changes the constraints appropriately, and also takes the
opportunity to combine the two patterns into one using the newly
ava
I've committed this to the devel/omp/gcc-12 branch.
The patch fixes a concurrency issue where the spin-locks didn't work
well if many GPU threads tried to free low-latency memory all at once.
Adding a short sleep instruction is enough for the hardware thread to
yield and allow another to proc
I've committed this patch to the devel/omp/gcc-12 branch. It fixes some
missed cases in the Unified Shared Memory implementation that were
especially noticeable in Fortran because the size of arrays are known.
This patch will have to be folded into the mainline USM patches that
were submitted
Here's a new version of the patch.
On 01/12/2022 14:16, Jakub Jelinek wrote:
+void __attribute__((noinline))
You should use noipa attribute instead of noinline on callers
which aren't declare simd (on declare simd it would prevent cloning
which is essential for the declare simd behavior), so t
This patch fixes a runtime issue I encountered with the AMD GCN Unified
Shared Memory implementation.
We were using regular malloc'd memory configured into USM mode, but
there were random intermittent crashes. I can't be completely sure, but
my best guess is that the HSA driver is using malloc
I changed it to use 128-byte alignment to match the GPU cache-lines.
Committed to OG12.
Andrew
On 11/01/2023 18:05, Andrew Stubbs wrote:
This patch fixes a runtime issue I encountered with the AMD GCN Unified
Shared Memory implementation.
We were using regular malloc'd memory confi
On 13/01/2021 15:07, Chung-Lin Tang wrote:
We currently emit errors, but do not fatally cause exit of the program
if those
are not met. We're still unsure if complete block-out of program
execution is the right
thing for the user. This can be discussed later.
After the Unified Shared Memory p
On 08/03/2022 11:30, Hafiz Abid Qadeer wrote:
gcc/ChangeLog:
* omp-low.cc (omp_enable_pinned_mode): New function.
(execute_lower_omp): Call omp_enable_pinned_mode.
This worked for x86_64, but I needed to make the attached adjustment to
work on powerpc without a linker error.
On 08/03/2022 11:30, Hafiz Abid Qadeer wrote:
This patches changes calls to malloc/free/calloc/realloc and operator new to
memory allocation functions in libgomp with
allocator=ompx_unified_shared_mem_alloc.
This additional patch adds transformation for omp_target_alloc. The
OpenMP 5.0 documen
On 02/04/2022 13:04, Andrew Stubbs wrote:
This additional patch adds transformation for omp_target_alloc. The
OpenMP 5.0 document says that addresses allocated this way needs to work
without is_device_ptr. The easiest way to make that work is to make them
USM addresses.
Actually, reading on
This patch adjusts the testcases, previously proposed, to allow for
testing on machines with varying page sizes and default amounts of
lockable memory. There turns out to be more variation than I had thought.
This should go on mainline at the same time as the previous patches in
this thread.
This patch adds enough support for "requires unified_address" to make
the sollve_vv testcases pass. It implements unified_address as a synonym
of unified_shared_memory, which is both valid and the only way I know of
to unify addresses with Cuda (could be wrong).
This patch should be applied on
On 06/04/2022 11:02, Thomas Schwinge wrote:
Hi!
On 2021-01-14T15:50:23+0100, I wrote:
I'm raising here an issue with HSA libgomp plugin code changes from a
while ago. While HSA is now no longer relevant for GCC master branch,
the same code has also been copied into the GCN libgomp plugin.
He
I've committed this patch to fix a couple of bugs introduced in the
recent CMul patch.
First, the fmsubadd insn was accidentally all adds and no substracts.
Second, there were input dependencies on the undefined output register
which caused the compiler to reserve unnecessary slots in the stac
On 05/05/2023 12:10, Tobias Burnus wrote:
Probably added for symmetry with out_mode/out_n but at the end not used.
That function was added in commit
r13-6423-gce9cd7258d0 amdgcn: Enable SIMD vectorization of math
functions
Tested the removal by building with that patch applied.
OK for mainl
On 03/11/2022 17:47, Kwok Cheung Yeung wrote:
Hello
This patch fixes a bug introduced in a previous patch adding support for
generating native instructions for the exp2 and log2 patterns. The
problem is that the name of the instruction implementing the exp2
operation is v_exp (and not v_exp2)
On 08/11/2022 14:35, Kwok Cheung Yeung wrote:
Hello
This patch adds three extra builtins for the vectorized forms of the
abs, floorf and floor math functions, which are implemented by native
GCN instructions. I have also added a test to check that they generate
the expected assembler instruct
On 16/11/2022 11:42, Tobias Burnus wrote:
This is a part of a patch by Andrew (hi!) - namely that part that only
adds the
__builtin_gcn_kernarg_ptr. More is planned, see below.
The short term benefit of this patch is to permit replacing hardcoded
numbers
by a builtin – like in libgomp (see pa
On 09/02/2023 20:13, Andrew Jenner wrote:
This patch introduces instruction patterns for complex number operations
in the GCN machine description. These patterns are cmul, cmul_conj,
vec_addsub, vec_fmaddsub, vec_fmsubadd, cadd90, cadd270, cmla and cmls
(cmla_conj and cmls_conj were not found t
On 14/02/2023 12:54, Thomas Schwinge wrote:
Hi Andrew!
On 2022-01-13T11:13:51+, Andrew Stubbs wrote:
Updated patch: this version fixes some missed cases of malloc in the
realloc implementation.
Right, and as it seems I've run into another issue: a stray 'free'.
These patches implement an LDS memory allocator for OpenMP on AMD.
1. 230216-basic-allocator.patch
Separate the allocator from NVPTX so the code can be shared.
2. 230216-amd-low-lat.patch
Allocate the memory, adjust the default address space, and hook up the
allocator.
They will need to be
On 17/02/2023 08:12, Thomas Schwinge wrote:
Hi Andrew!
On 2023-02-16T23:06:44+0100, I wrote:
On 2023-02-16T16:17:32+, "Stubbs, Andrew via Gcc-patches"
wrote:
The mmap implementation was not optimized for a lot of small allocations, and I
can't see that issue changing here
That's corre
On 16/02/2023 21:11, Thomas Schwinge wrote:
--- /dev/null
+++ b/libgomp/basic-allocator.c
+#ifndef BASIC_ALLOC_YIELD
+#deine BASIC_ALLOC_YIELD
+#endif
In file included from [...]/libgomp/config/nvptx/allocator.c:49:
[...]/libgomp/config/nvptx/../../basic-allocator.c:52:2: error: in
On 10/02/2023 09:11, Jakub Jelinek wrote:
I've tried to fix the -flto thing and I can't figure out how. The problem
seems to be that there are two dump files from the two compiler invocations
and it scans the wrong one. Aarch64 has the same problem.
Two dumps are because it is in a dg-do run te
This patch fixes a bug in which libgomp doesn't know what to do with
attached pointers in fortran derived types when using Unified Shared
Memory instead of explicit mappings.
I've committed it to the devel/omp/gcc-12 branch (OG12) and will fold it
into the next rebase/repost of the USM patches
On 28/02/2023 23:01, Kwok Cheung Yeung wrote:
Hello
This patch implements the TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
target hook for the AMD GCN architecture, such that when vectorized,
calls to builtin standard math functions such as asinf, exp, pow etc.
are converted to calls to the r
On 01/03/2023 10:52, Andre Vieira (lists) wrote:
On 01/03/2023 10:01, Andrew Stubbs wrote:
> On 28/02/2023 23:01, Kwok Cheung Yeung wrote:
>> Hello
>>
>> This patch implements the TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
>> target hook for the AMD GCN a
On 01/03/2023 16:56, Paul-Antoine Arras wrote:
This patch introduces instruction patterns for conditional min and max
operations (cond_{f|s|u}{max|min}) in the GCN machine description. It
also allows the exec register to be saved in SGPRs to avoid spilling to
memory.
Tested on GCN3 Fiji gfx803
On 02/03/2023 15:07, Kwok Cheung Yeung wrote:
Hello
I've made the suggested changes. Should I hold off on committing this
until GCC 13 has been branched off?
No need, amdgcn is not a primary target and this stuff won't affect
anyone else. Please go ahead and commit.
Andrew
On 03/03/2023 17:05, Paul-Antoine Arras wrote:
Le 02/03/2023 à 18:18, Andrew Stubbs a écrit :
On 01/03/2023 16:56, Paul-Antoine Arras wrote:
This patch introduces instruction patterns for conditional min and max
operations (cond_{f|s|u}{max|min}) in the GCN machine description. It
also allows
On 08/03/2023 11:06, Tobias Burnus wrote:
Next try – this time with both patches.
On 08.03.23 12:05, Tobias Burnus wrote:
Hi Andrew,
attached are two patches related to GCN, one for libgomp.texi
documenting an env var
and a release-notes update in www docs.
OK? Comments?
LGTM
Andrew
On 13/03/2023 12:25, Tobias Burnus wrote:
Found when comparing '-v -Wl,-v' output as despite -save-temps multiple
runs
yielded differed results.
Fixed as attached.
OK for mainline?
OK.
Andrew
ssues, but rather existing
problems that did not show up because the code did not previously
vectorize. Expanding the testcase to allow 64-lane vectors shows the
same problems there.
I shall backport these patches to the OG12 branch shortly.
Andrew
Andrew Stubbs (6):
amdgcn: add multiple ve
GET_MODE_NUNITS isn't a compile time constant, so we end up with many
impossible insns in the machine description. Adding MODE_VF allows the insns
to be eliminated completely.
gcc/ChangeLog:
* config/gcn/gcn-valu.md
(2): Use MODE_VF.
(2): Likewise.
* config/gcn/g
Add vec_extract expanders for all valid pairs of vector types.
gcc/ChangeLog:
* config/gcn/gcn-protos.h (get_exec): Add prototypes for two variants.
* config/gcn/gcn-valu.md
(vec_extract): New define_expand.
* config/gcn/gcn.cc (get_exec): Export the existing func
Implements vec_init when the input is a vector of smaller vectors, or of
vector MEM types, or a smaller vector duplicated several times.
gcc/ChangeLog:
* config/gcn/gcn-valu.md (vec_init): New.
* config/gcn/gcn.cc (GEN_VN): Add andvNsi3, subvNsi3.
(GEN_VNM): Add gathervNm
The vectors sizes are simulated using implicit masking, but they make life
easier for the autovectorizer and SLP passes.
gcc/ChangeLog:
* config/gcn/gcn-modes.def (VECTOR_MODE): Add new modes
V32QI, V32HI, V32SI, V32DI, V32TI, V32HF, V32SF, V32DF,
V16QI, V16HI, V16SI, V16
Another example of the vectorizer needing explicit insns where the scalar
expander just works.
gcc/ChangeLog:
* config/gcn/gcn-valu.md (neg2): New define_expand.
---
gcc/config/gcn/gcn-valu.md | 13 +
1 file changed, 13 insertions(+)
diff --git a/gcc/config/gcn/gcn-valu.md
The testsuite needs a few tweaks following my patches to add multiple vector
sizes for amdgcn.
gcc/testsuite/ChangeLog:
* gcc.dg/pr104464.c: Xfail on amdgcn.
* gcc.dg/signbit-2.c: Likewise.
* gcc.dg/signbit-5.c: Likewise.
* gcc.dg/vect/bb-slp-68.c: Likewise.
On 11/10/2022 12:29, Richard Biener wrote:
On Tue, Oct 11, 2022 at 1:03 PM Andrew Stubbs wrote:
This patch series adds additional vector sizes for the amdgcn backend.
The hardware supports any arbitrary vector length up to 64-lanes via
masking, but GCC cannot (yet) make full use of them due
On 12/10/2022 15:29, Tobias Burnus wrote:
On 29.09.22 18:24, Andrew Stubbs wrote:
On 27/09/2022 14:16, Tobias Burnus wrote:
Andrew did suggest a while back to piggyback on the console_output
handling,
avoiding another atomic access. - If this is still wanted, I like to
have some
guidance
On 14/10/2022 08:07, Richard Biener wrote:
On Tue, 11 Oct 2022, Richard Sandiford wrote:
Richard Biener writes:
On Mon, 10 Oct 2022, Andrew Stubbs wrote:
On 10/10/2022 12:03, Richard Biener wrote:
The following picks up the prototype by Ju-Zhe Zhong for vectorizing
first order recurrences
This patch fixes a problem in which fatal errors inside mutex-locked
regions (i.e. basically anything in the plugin) will cause it to hang up
trying to take the lock to clean everything up.
Using abort() instead of exit(1) bypasses the atexit handlers and solves
the problem.
OK for mainline?
I've committed this patch to the devel/omp/gcc-12 branch. I will have to
fold it into my previous OpenMP memory management patch series when I
repost it.
The patch changes the internal memory allocation method such that memory
is allocated in the regular heap and then marked as "coarse-grained
I've committed this patch to the devel/omp/gcc-12 branch. I will have to
fold it into my previous OpenMP memory management patch series when I
repost it.
The GFX908 (MI100) devices only partially support the Unified Shared
Memory model that we have, and only then with additional kernel boot
p
I've committed this to the OG12 branch to remove some test failures. We
probably ought to have something on mainline also, but a proper fix
would be better.
Without this. the libgomp.oacc-c-c++-common/private-variables.c testcase
fails to compile due to an ICE. The OpenACC worker broadcasting
On 24/10/2022 19:06, Richard Biener wrote:
Am 24.10.2022 um 18:51 schrieb Andrew Stubbs :
I've committed this to the OG12 branch to remove some test failures. We
probably ought to have something on mainline also, but a proper fix would be
better.
Without this. the libgomp.oac
A function parameter was left over from a previous draft of my
multiple-vector-length patch. This patch silences the harmless warning.
Andrewamdgcn: Silence unused parameter warning
gcc/ChangeLog:
* config/gcn/gcn.cc (gcn_simd_clone_compute_vecsize_and_simdlen):
Set base_type a
This patch adds patterns for the fmin and fmax operators, for scalars,
vectors, and vector reductions.
The compiler uses smin and smax for most floating-point optimizations,
etc., but not where the user calls fmin/fmax explicitly. On amdgcn the
hardware min/max instructions are already IEEE c
My recent patch to add additional vector lengths didn't address the
vector reductions yet.
This patch adds the missing support. Shorter vectors use fewer reduction
steps, and the means to extract the final value has been adjusted.
Lacking from this is any useful costs, so for loops the vect p
On 16/05/2022 11:28, Tobias Burnus wrote:
While 'vendor' and 'kind' is well defined, 'arch' and 'isa' isn't.
When looking at an 'metadirective' testcase (which oddly uses 'arch(amd)'),
I noticed that LLVM uses 'arch(amdgcn)' while we use 'gcn', cf. e.g.
'clang/lib/Headers/openmp_wrappers/math.h'
On 19/05/2022 17:00, Jakub Jelinek wrote:
Without requires dynamic_allocators, there are various extra restrictions
imposed:
1) omp_init_allocator/omp_destroy_allocator may not be called (except for
implicit calls to it from uses_allocators) in a target region
I interpreted that more like "
I've committed this patch to set the minimum required LLVM version, for
the assembler and linker, to 13.0.1. An upgrade from LLVM 9 is a
prerequisite for the gfx90a support, and 13.0.1 is now the oldest
version not known to have compatibility issues.
The patch removes all the obsolete feature
I've committed this patch to add support for gfx90a AMD GPU devices.
The patch updates all the places that have architecture/ISA specific
code, tidies up the ISA naming and handling in the backend, and adds a
new multilib.
This is just lightly tested at this point, but there are no known issu
On 24/05/2022 17:44, Tobias Burnus wrote:
On 24.05.22 17:31, Andrew Stubbs wrote:
amdgcn: Add gfx90a support
Attached is an attempt to update invoke.texi
I've deliberately avoided the MI100 and MI200 names because they're
really not that simple. MI100 is gfx908, but MI150 is
On 25/05/2022 12:16, Tobias Burnus wrote:
On 25.05.22 11:18, Andrew Stubbs wrote:
On 24/05/2022 17:44, Tobias Burnus wrote:
On 24.05.22 17:31, Andrew Stubbs wrote:
amdgcn: Add gfx90a support
I've deliberately avoided the MI100 and MI200 names because they're
really not that simple
On 27/05/2022 20:16, Thomas Schwinge wrote:
Hi Andrew!
On 2022-05-24T16:27:52+0100, Andrew Stubbs wrote:
I've committed this patch to set the minimum required LLVM version, for
the assembler and linker, to 13.0.1. An upgrade from LLVM 9 is a
prerequisite for the gfx90a support, and 13.0
x27;t know
how that'll handle heterogenous systems, but those ought to be rare.
I don't think libmemkind will resolve this performance issue, although
certainly it can be used for host implementations of low-latency
memories, etc.
Andrew
On 13/01/2022 13:53, Andrew Stubbs wrote:
On 07/06/2022 13:10, Jakub Jelinek wrote:
On Tue, Jun 07, 2022 at 12:05:40PM +0100, Andrew Stubbs wrote:
Following some feedback from users of the OG11 branch I think I need to
withdraw this patch, for now.
The memory pinned via the mlock call does not give the expected performance
boost. I
This patch fixes a bug in which testcases using thread_limit larger than
the number of physical threads would crash with a memory fault. This was
exacerbated in testcases with a lot of register pressure because the
autoscaling reduces the number of physical threads to compensate for the
increas
On 02/02/2022 15:39, Tobias Burnus wrote:
On 09.08.21 15:55, Tobias Burnus wrote:
Now that the GCN/OpenACC patches for this have been committed today,
I think it makes sense to add it to the documentation.
(I was told that some follow-up items are still pending, but as
the feature does work ...)
I've committed this fix for an ICE compiling sollve_vv testcase
test_target_teams_distribute_defaultmap.c.
Somehow the optimizers result in a vector reduction on a vector of
duplicated constants. This was a case the backend didn't handle, so we
ended up with an unrecognised instruction ICE.
On 14/02/2022 14:13, Andrew Stubbs wrote:
I've committed this fix for an ICE compiling sollve_vv testcase
test_target_teams_distribute_defaultmap.c.
Somehow the optimizers result in a vector reduction on a vector of
duplicated constants. This was a case the backend didn't handle, so
On 09/03/2022 16:29, Tobias Burnus wrote:
This shows up with with OpenMP offloading as libgomp since a couple
of months uses __atomic_compare_exchange (see PR for details),
causing link errors when the gcn libgomp.a is linked.
It also shows up with sollve_vv.
The implementation does a bit copy'n
Hi,
This patch is needed for AMD GCN offloading when we use the assembler
from LLVM 13+.
The GCN runtime (libgomp+ROCm) requires that the location of all
variables in the offloaded variables table are discoverable at runtime
(using the "hsa_executable_symbol_get_info" API), and this only wor
On 24/08/2021 12:43, Richard Biener via Gcc-patches wrote:
On Tue, Aug 24, 2021 at 12:23 PM Thomas Schwinge
wrote:
Hi!
On 2021-08-19T22:13:56+0200, I wrote:
On 2021-08-16T10:21:04+0200, Jakub Jelinek wrote:
On Mon, Aug 16, 2021 at 10:08:42AM +0200, Thomas Schwinge wrote:
|> Concerning the
I've committed this patch to allow GCC to adapt to the different
variants of the LLVM amdgcn assembler. Unfortunately they keep making
changes without maintaining backwards compatibility.
GCC should now work with LLVM 9, LLVM 12, and LLVM 13 in terms of CLI
usage, however only LLVM 9 is well t
801 - 900 of 992 matches
Mail list logo