Re: [OG12 commit] vect: WORKAROUND vectorizer bug
On 24/10/2022 19:06, Richard Biener wrote: Am 24.10.2022 um 18:51 schrieb Andrew Stubbs : I've committed this to the OG12 branch to remove some test failures. We probably ought to have something on mainline also, but a proper fix would be better. Without this. the libgomp.oacc-c-c++-common/private-variables.c testcase fails to compile due to an ICE. The OpenACC worker broadcasting code is creating SLP optimizable loads and stores in amdgcn address-space-4. Previously this was "ok" as SLP didn't work with less that 64-lane vectors, but the newly implemented smaller vectors are working as intended and optimizing this. Unfortunately the vectorizer is losing the address-space data from the intermediate types, and it all falls apart during expand when it tries the convert a 32-bit address into a 64-bit address and that's not something that works. At first sight it looks like we could possibly make that work with POINTERS_EXTEND_UNSIGNED, but that only changes the error message. Fundamentally we need to make sure that various instances of "vectype" have the correct address space, but my attempts to do so showed that that's a larger task than I have time for right now. Istr there were issues like this in the past that I fixed, so any testcase that exposes this with just a gcn cc1 would be nice to have. I've been unable to reproduce this issue on the mainline compiler. The SLP vectorizer says the accesses are not consecutive, although I don't know why they would be different. A simple testcase works fine on OG12 as well. It's something weird to do with the OpenACC worker broadcasting code that I can't reproduce manually. Thank you for the offer. I'll let you know if I get a testcase. Andrew
Re: [OG12 commit] vect: WORKAROUND vectorizer bug
> Am 24.10.2022 um 18:51 schrieb Andrew Stubbs : > > I've committed this to the OG12 branch to remove some test failures. We > probably ought to have something on mainline also, but a proper fix would be > better. > > Without this. the libgomp.oacc-c-c++-common/private-variables.c testcase > fails to compile due to an ICE. The OpenACC worker broadcasting code is > creating SLP optimizable loads and stores in amdgcn address-space-4. > Previously this was "ok" as SLP didn't work with less that 64-lane vectors, > but the newly implemented smaller vectors are working as intended and > optimizing this. > > Unfortunately the vectorizer is losing the address-space data from the > intermediate types, and it all falls apart during expand when it tries the > convert a 32-bit address into a 64-bit address and that's not something that > works. At first sight it looks like we could possibly make that work with > POINTERS_EXTEND_UNSIGNED, but that only changes the error message. > Fundamentally we need to make sure that various instances of "vectype" have > the correct address space, but my attempts to do so showed that that's a > larger task than I have time for right now. Istr there were issues like this in the past that I fixed, so any testcase that exposes this with just a gcn cc1 would be nice to have. Richard > > This patch simply prevents the vectorizer working in the case where it would > break. This should not be a regression because this code didn't vectorize at > all, previously. > > Andrew > <221024-workarround-vec-addrspace-bug.patch>
[OG12 commit] vect: WORKAROUND vectorizer bug
I've committed this to the OG12 branch to remove some test failures. We probably ought to have something on mainline also, but a proper fix would be better. Without this. the libgomp.oacc-c-c++-common/private-variables.c testcase fails to compile due to an ICE. The OpenACC worker broadcasting code is creating SLP optimizable loads and stores in amdgcn address-space-4. Previously this was "ok" as SLP didn't work with less that 64-lane vectors, but the newly implemented smaller vectors are working as intended and optimizing this. Unfortunately the vectorizer is losing the address-space data from the intermediate types, and it all falls apart during expand when it tries the convert a 32-bit address into a 64-bit address and that's not something that works. At first sight it looks like we could possibly make that work with POINTERS_EXTEND_UNSIGNED, but that only changes the error message. Fundamentally we need to make sure that various instances of "vectype" have the correct address space, but my attempts to do so showed that that's a larger task than I have time for right now. This patch simply prevents the vectorizer working in the case where it would break. This should not be a regression because this code didn't vectorize at all, previously. Andrewvect: WORKAROUND vectorizer bug This patch disables vectorization of memory accesses to non-default address spaces where the pointer size is different to the usual pointer size. This condition typically occurs in OpenACC programs on amdgcn, where LDS memory is used for broadcasting gang-private variables between threads. In particular, see libgomp.oacc-c-c++-common/private-variables.c The problem is that the address space information is dropped from the various types in the middle-end and eventually it triggers an ICE trying to do an address conversion. That ICE can be avoided by defining POINTERS_EXTEND_UNSIGNED, but that just produces wrong RTL code later on. A correct solution would ensure that all the vectypes have the correct address spaces, but I don't have time for that right now. gcc/ChangeLog: * tree-vect-data-refs.cc (vect_analyze_data_refs): Workaround an address-space bug. diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc index 09223baf718..70b671ed94a 100644 --- a/gcc/tree-vect-data-refs.cc +++ b/gcc/tree-vect-data-refs.cc @@ -4598,7 +4598,21 @@ vect_analyze_data_refs (vec_info *vinfo, poly_uint64 *min_vf, bool *fatal) /* Set vectype for STMT. */ scalar_type = TREE_TYPE (DR_REF (dr)); tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type); - if (!vectype) + + /* FIXME: If the object is in an address-space in which the pointer size +is different to the default address space then vectorizing here will +lead to an ICE down the road because the address space information +gets lost. This work-around fixes the problem until we have a proper +solution. */ + tree base_object = DR_REF (dr); + tree op = (TREE_CODE (base_object) == COMPONENT_REF +|| TREE_CODE (base_object) == ARRAY_REF +? TREE_OPERAND (base_object, 0) : base_object); + addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (op)); + bool addr_space_bug = (!ADDR_SPACE_GENERIC_P (as) +&& targetm.addr_space.pointer_mode (as) != Pmode); + + if (!vectype || addr_space_bug) { if (dump_enabled_p ()) {