[Bug target/97521] [11 Regression] wrong code with -mno-sse2 since r11-3394
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|REOPENED|RESOLVED --- Comment #24 from Richard Biener --- Fixed.
[Bug target/97521] [11 Regression] wrong code with -mno-sse2 since r11-3394
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521 --- Comment #23 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:605c2a393d3a2db86454a70fd7c9467db434060c commit r11-4381-g605c2a393d3a2db86454a70fd7c9467db434060c Author: Richard Biener Date: Fri Oct 23 08:40:15 2020 +0200 middle-end/97521 - always use single-bit bools in mask vector types This makes us always use a single-bit boolean type component type for integer mode mask VECTOR_BOOLEAN_TYPE_P to match the RTL and target representation. This aovids the need for magic translation and the inconsistencies from the translation requirement now that we expose temporaries of those types on the GIMPLE level. 2020-10-23 Richard Biener PR middle-end/97521 * expr.c (const_scalar_mask_from_tree): Remove. (expand_expr_real_1): Always VIEW_CONVERT integer mode vector constants to an integer type. * tree.c (build_truth_vector_type_for_mode): Use a single-bit boolean component type for non-vector-mode mask_mode. * gcc.target/i386/pr97521.c: New testcase.
[Bug target/97521] [11 Regression] wrong code with -mno-sse2 since r11-3394
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521 --- Comment #22 from Andrew Stubbs --- (In reply to Andrew Stubbs from comment #21) > (In reply to Richard Biener from comment #19) > > GCN also uses MODE_INT for the mask mode and thus may be similarly affected. > > Andrew - are the bits in the mask dense? Thus for a V4SImode compare > > would the mask occupy only the lowest 4 bits of the DImode mask? > > Yes, that's correct. Or rather, I should say that that *will* be the case when I add partial vector support; right now it can only be done via masking V64SImode. A have a patch set, but the last problem is that while_ult doesn't operate on partial integer masks, leading to wrong code. AArch64 doesn't have a problem with this because it uses VBI masks of the right size. I have a patch that adds the vector size as an operand to while_ult; this seems to fix the problems on GCN, but I need to make corresponding changes for AArch64 also before I can submit those patches, and time is tight.
[Bug target/97521] [11 Regression] wrong code with -mno-sse2 since r11-3394
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521 --- Comment #21 from Andrew Stubbs --- (In reply to Richard Biener from comment #19) > GCN also uses MODE_INT for the mask mode and thus may be similarly affected. > Andrew - are the bits in the mask dense? Thus for a V4SImode compare > would the mask occupy only the lowest 4 bits of the DImode mask? Yes, that's correct.
[Bug target/97521] [11 Regression] wrong code with -mno-sse2 since r11-3394
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521 --- Comment #20 from Richard Biener --- So at least the tree.c use of get_mask_mode only passes it MODE_VECTOR so we don't have to second-guess the component size when passed a MODE_INT used as representation for an integer vector type. So I'm testing the following tree build_truth_vector_type_for_mode (poly_uint64 nunits, machine_mode mask_mode) { gcc_assert (mask_mode != BLKmode); unsigned HOST_WIDE_INT esize; if (VECTOR_MODE_P (mask_mode)) { poly_uint64 vsize = GET_MODE_BITSIZE (mask_mode); esize = vector_element_size (vsize, nunits); } else esize = 1; tree bool_type = build_nonstandard_boolean_type (esize); return make_vector_type (bool_type, nunits, mask_mode); }
[Bug target/97521] [11 Regression] wrong code with -mno-sse2 since r11-3394
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521 Richard Biener changed: What|Removed |Added CC||ams at gcc dot gnu.org --- Comment #19 from Richard Biener --- GCN also uses MODE_INT for the mask mode and thus may be similarly affected. Andrew - are the bits in the mask dense? Thus for a V4SImode compare would the mask occupy only the lowest 4 bits of the DImode mask?
[Bug target/97521] [11 Regression] wrong code with -mno-sse2 since r11-3394
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521 --- Comment #18 from Richard Biener --- Maybe the x86 backend should use partial integer modes here (though we'd have quite some, eventually not even possible), so the mask mode precision would tell us how to build the types. Or the targetm.vectorize.get_mask_mode needs to be amended to return the number of bits used per lane. So the following "works" for the testcase in this bug and the FAILs reported but of course breaks the non-AVX512 case because we cannot distinguish between integer mode vectors and integer mode masks :/ The target hook docs leave the option to return no mask mode but the default implementation would already always return an integer vector mode if it exists. So given that changing x86 to use MODE_VECTOR_BOOL is unlikely we need a way to distinguish MODE_INT vectors from MODE_INT vector bools, thus, the target needs to tell us the precision of the elements. diff --git a/gcc/expr.c b/gcc/expr.c index 9d951e82522..ae16f077758 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -10356,16 +10355,10 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode, scalar_int_mode int_mode; if (is_int_mode (mode, _mode)) { - if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (exp))) - return const_scalar_mask_from_tree (int_mode, exp); - else - { - tree type_for_mode - = lang_hooks.types.type_for_mode (int_mode, 1); - if (type_for_mode) - tmp = fold_unary_loc (loc, VIEW_CONVERT_EXPR, - type_for_mode, exp); - } + tree type_for_mode = lang_hooks.types.type_for_mode (int_mode, 1); + if (type_for_mode) + tmp = fold_unary_loc (loc, VIEW_CONVERT_EXPR, + type_for_mode, exp); } if (!tmp) { diff --git a/gcc/tree.c b/gcc/tree.c index 555ba97e68b..608f124 100644 --- a/gcc/tree.c +++ b/gcc/tree.c @@ -10926,9 +10926,7 @@ build_truth_vector_type_for_mode (poly_uint64 nunits, machine_mode mask_mode) { gcc_assert (mask_mode != BLKmode); - poly_uint64 vsize = GET_MODE_BITSIZE (mask_mode); - unsigned HOST_WIDE_INT esize = vector_element_size (vsize, nunits); - tree bool_type = build_nonstandard_boolean_type (esize); + tree bool_type = build_nonstandard_boolean_type (1); return make_vector_type (bool_type, nunits, mask_mode); }
[Bug target/97521] [11 Regression] wrong code with -mno-sse2 since r11-3394
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521 --- Comment #17 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:7cda498920dbf244e9e06fdb2fc710a118a8c033 commit r11-4278-g7cda498920dbf244e9e06fdb2fc710a118a8c033 Author: Richard Biener Date: Fri Oct 23 08:21:39 2020 +0200 Revert "middle-end/97521 - fix VECTOR_CST expansion" 2020-10-23 Richard Biener PR middle-end/97521 * expr.c (expand_expr_real_1): Revert last change. * gcc.target/i386/pr97521.c: Remove. This reverts commit b960a9c83a93b58a84a7a370002990810675ac5d.
[Bug target/97521] [11 Regression] wrong code with -mno-sse2 since r11-3394
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521 --- Comment #16 from Richard Biener --- (In reply to Richard Biener from comment #15) > CI with -march=cascadelake reports > .. > FAIL: gcc.target/i386/avx2-vpcmpeqq-2.c execution test expands (gdb) p debug_tree (exp) unit-size align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7683a348 precision:2 min max > QI size unit-size align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7683a3f0 nunits:4> constant tree_0 npatterns:2 nelts-per-pattern:2 elt:0: constant 0> elt:1: constant -1> elt:2: elt:3: > which shows the heuristic cannot work. We possibly can refine it to key on mode-precision component types - which _might_ work since it seems x86 uses the smallest integer mode to hold nunits bits - but that's of course not something guaranteed for non-x86. I wonder why we're insisting to "fill" the mask mode on GENERIC/GIMPLE while RTL produces packed bits. Thus, why do we use a QImode vector(4) here instead of a QImode vector(4) if the target in the end will produce that from say, a V4SImode compare-to-mask? As long as we didn't expose temporaries of those types this was well-hidden up to RTL expansion which then did "magic" but now we're really facing inconsistent representations. Now targets _could_ opt to use QImode vector(4) but then with representing { -1, -1, -1, -1 } as 0b (with the 'padding bits' sign-extended). For now I'm going to revert the patch but I still believe const_scalar_mask_from_tree is a red herring.
[Bug target/97521] [11 Regression] wrong code with -mno-sse2 since r11-3394
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521 Richard Biener changed: What|Removed |Added Resolution|FIXED |--- Status|RESOLVED|REOPENED --- Comment #15 from Richard Biener --- CI with -march=cascadelake reports FAIL: c-c++-common/torture/vector-compare-2.c -O0 execution test FAIL: gcc.target/i386/avx2-vpcmpeqq-2.c execution test FAIL: gfortran.dg/allocate_zerosize_3.f -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test FAIL: gfortran.dg/allocate_zerosize_3.f -O3 -g execution test FAIL: gfortran.dg/bound_6.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test FAIL: gfortran.dg/bound_6.f90 -O3 -g execution test
[Bug target/97521] [11 Regression] wrong code with -mno-sse2 since r11-3394
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #14 from Richard Biener --- Accidentially pushed it before knowing AVX512 HW results. Let's hope fixed.
[Bug target/97521] [11 Regression] wrong code with -mno-sse2 since r11-3394
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521 --- Comment #13 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:b960a9c83a93b58a84a7a370002990810675ac5d commit r11-4226-gb960a9c83a93b58a84a7a370002990810675ac5d Author: Richard Biener Date: Thu Oct 22 09:29:47 2020 +0200 middle-end/97521 - fix VECTOR_CST expansion This fixes expansion of VECTOR_BOOLEAN_TYPE_P VECTOR_CSTs which when using an integer mode are not always "mask-mode" but may be using an integer mode when there's no supported vector mode. The patch makes sure to only go the mask-mode expansion if the elements do not line up to cover the full integer mode (when they do and the mode was an actual mask-mode there's no actual difference in both expansions). 2020-10-22 Richard Biener PR middle-end/97521 * expr.c (expand_expr_real_1): Be more careful when expanding a VECTOR_BOOLEAN_TYPE_P VECTOR_CSTs. * gcc.target/i386/pr97521.c: New testcase.
[Bug target/97521] [11 Regression] wrong code with -mno-sse2 since r11-3394
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521 --- Comment #12 from rguenther at suse dot de --- On Thu, 22 Oct 2020, jakub at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521 > > --- Comment #11 from Jakub Jelinek --- > We don't necessarily need to make it user accessible. Though I've missed a way to create VECTOR_BOOLEAN_TYPE_P vector types for the GIMPLE FE (which measn the C FE has to have a way to create them).
[Bug target/97521] [11 Regression] wrong code with -mno-sse2 since r11-3394
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521 --- Comment #11 from Jakub Jelinek --- We don't necessarily need to make it user accessible.
[Bug target/97521] [11 Regression] wrong code with -mno-sse2 since r11-3394
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521 --- Comment #10 from Hongtao.liu --- Speaking about how to represent the V*BImode constants, i think we need to extend attribute vector_size to handle something like --- typedef bool v8bi __attribute__ ((vector_size (1))); --- currently there would be an error as :1:51: error: invalid vector type for attribute 'vector_size' 1 | typedef bool v8qi __attribute__ ((vector_size (1))); also we need to support type convert between vector boolean types and integer in front end, since many users may use integer as a mask.
[Bug target/97521] [11 Regression] wrong code with -mno-sse2 since r11-3394
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521 --- Comment #9 from Jakub Jelinek --- Looking at simplify-rtx.c, I think as long as the constants are CONST_INTs, simplify-rtx.c doesn't care if the non-constant is represented with scalar int mode registers (etc.) or V*BImode.
[Bug target/97521] [11 Regression] wrong code with -mno-sse2 since r11-3394
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521 --- Comment #8 from Jakub Jelinek --- Well, I think it is certainly desirable to keep using VEC_MERGE, not only because it is less IL. But pedantically already what i386 backend does doesn't match the documentation which says that the last operand of VEC_MERGE is a CONST_INT. We could just say that it is either a CONST_INT, or an RTL expression with a scalar integral mode, or V*BImode mode. And another question is how to represent the V*BImode constants, whether to represent them as normal VOIDmode CONST_INTs, or something else (const_vector:V*BI ...).
[Bug target/97521] [11 Regression] wrong code with -mno-sse2 since r11-3394
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521 --- Comment #7 from rguenther at suse dot de --- On Thu, 22 Oct 2020, crazylht at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521 > > Hongtao.liu changed: > >What|Removed |Added > > CC||crazylht at gmail dot com > > --- Comment #6 from Hongtao.liu --- > (In reply to Richard Biener from comment #4) > > (In reply to Jakub Jelinek from comment #3) > > > And we do that because: > > > case VECTOR_CST: > > > { > > > tree tmp = NULL_TREE; > > > if (VECTOR_MODE_P (mode)) > > > return const_vector_from_tree (exp); > > > scalar_int_mode int_mode; > > > if (is_int_mode (mode, _mode)) > > > { > > > if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (exp))) > > > return const_scalar_mask_from_tree (int_mode, exp); > > > > but I have no easy access to a AVX512 runtime system (and no idea how to > > make dejagnu use sde for a i386.exp testsuite run). On a AVX2 system > > i386.exp is clean with the above. > > > > Thoughts? > > > > [AVX512 should have used VnBImode like SVE does] > > > > > Can vec_merge support operands with VnBImode as items? I didn't find it's used > in any SVE patterns. You can use (vec_select (vec_concat ...)) instead of (vec_merge ...). It's a long standing issue that we have two ways of expressing the same (each with it's own drawbacks).
[Bug target/97521] [11 Regression] wrong code with -mno-sse2 since r11-3394
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #6 from Hongtao.liu --- (In reply to Richard Biener from comment #4) > (In reply to Jakub Jelinek from comment #3) > > And we do that because: > > case VECTOR_CST: > > { > > tree tmp = NULL_TREE; > > if (VECTOR_MODE_P (mode)) > > return const_vector_from_tree (exp); > > scalar_int_mode int_mode; > > if (is_int_mode (mode, _mode)) > > { > > if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (exp))) > > return const_scalar_mask_from_tree (int_mode, exp); > > but I have no easy access to a AVX512 runtime system (and no idea how to > make dejagnu use sde for a i386.exp testsuite run). On a AVX2 system > i386.exp is clean with the above. > > Thoughts? > > [AVX512 should have used VnBImode like SVE does] > Can vec_merge support operands with VnBImode as items? I didn't find it's used in any SVE patterns.
[Bug target/97521] [11 Regression] wrong code with -mno-sse2 since r11-3394
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521 --- Comment #5 from Jakub Jelinek --- I can test it on skylake-avx512.
[Bug target/97521] [11 Regression] wrong code with -mno-sse2 since r11-3394
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521 Richard Biener changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #4 from Richard Biener --- (In reply to Jakub Jelinek from comment #3) > And we do that because: > case VECTOR_CST: > { > tree tmp = NULL_TREE; > if (VECTOR_MODE_P (mode)) > return const_vector_from_tree (exp); > scalar_int_mode int_mode; > if (is_int_mode (mode, _mode)) > { > if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (exp))) > return const_scalar_mask_from_tree (int_mode, exp); I think const_scalar_mask_from_tree is completely wrong in ignoring the precision of the VECTOR_BOOLEAN component type. At least it is inconsistent with the processing of VECTOR_BOOLEAN_TYPE_P CTORs. Now I can see that this was intended for AVX512 but I can't see how this "inconsistency" can possibly work if you consider the use of _1: _19 = BIT_FIELD_REF <_1, 64, 0>; if you disable TER then how should we ever able to "reinterpret" the BIT_FIELD_REF to the "narrowed" view of the VECTOR_BOOLEAN_TYPE_P mask? A heuristic for an intermediate hack might be to skip const_scalar_mask_from_tree when the component precision times the number of elements equals the mode precision of the vector. Thus the following fixes the testcase diff --git a/gcc/expr.c b/gcc/expr.c index 9d951e82522..d87bda777d0 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -10356,7 +10356,10 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode, scalar_int_mode int_mode; if (is_int_mode (mode, _mode)) { - if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (exp))) + if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (exp)) + && maybe_ne (TYPE_PRECISION (TREE_TYPE (TREE_TYPE (exp))) +* TYPE_VECTOR_SUBPARTS (TREE_TYPE (exp)), +GET_MODE_PRECISION (int_mode))) return const_scalar_mask_from_tree (int_mode, exp); else but I have no easy access to a AVX512 runtime system (and no idea how to make dejagnu use sde for a i386.exp testsuite run). On a AVX2 system i386.exp is clean with the above. Thoughts? [AVX512 should have used VnBImode like SVE does] > where the VECTOR_CST type is now a vector boolean with DImode element type > and TImode as the (poor man's) vector mode. > > So, the question is how to differentiate between vector booleans that want > to be a bitmask in an integral mode vs. poor man's vector booleans for which > we'd want to fallthru into the VIEW_CONVERT_EXPR code below this. > And what other spots need that. > Perhaps check if the bitsize (or precision?) of the vector type's mode is > equal to bitsize (or precision?) of the element mode times number of > elements in the vector?
[Bug target/97521] [11 Regression] wrong code with -mno-sse2 since r11-3394
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521 Martin Liška changed: What|Removed |Added CC||marxin at gcc dot gnu.org Known to work||10.2.0 Priority|P3 |P1
[Bug target/97521] [11 Regression] wrong code with -mno-sse2 since r11-3394
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521 --- Comment #3 from Jakub Jelinek --- And we do that because: case VECTOR_CST: { tree tmp = NULL_TREE; if (VECTOR_MODE_P (mode)) return const_vector_from_tree (exp); scalar_int_mode int_mode; if (is_int_mode (mode, _mode)) { if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (exp))) return const_scalar_mask_from_tree (int_mode, exp); where the VECTOR_CST type is now a vector boolean with DImode element type and TImode as the (poor man's) vector mode. So, the question is how to differentiate between vector booleans that want to be a bitmask in an integral mode vs. poor man's vector booleans for which we'd want to fallthru into the VIEW_CONVERT_EXPR code below this. And what other spots need that. Perhaps check if the bitsize (or precision?) of the vector type's mode is equal to bitsize (or precision?) of the element mode times number of elements in the vector?
[Bug target/97521] [11 Regression] wrong code with -mno-sse2 since r11-3394
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521 --- Comment #2 from Jakub Jelinek --- It seems that vector(2) _1; vector(2) _2; ... _1 = _2 & { 0, -1 }; remains in the IL after that change, which results in expansion of TImode AND, but the second operand of that is (const_int 2 [0x2]) and not the expected 0x constant.
[Bug target/97521] [11 Regression] wrong code with -mno-sse2 since r11-3394
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521 Jakub Jelinek changed: What|Removed |Added Ever confirmed|0 |1 CC||jakub at gcc dot gnu.org Status|UNCONFIRMED |NEW Target Milestone|--- |11.0 Summary|[11 Regression] wrong code |[11 Regression] wrong code |with -mno-sse2 |with -mno-sse2 since ||r11-3394 Last reconfirmed||2020-10-21 --- Comment #1 from Jakub Jelinek --- Started with r11-3394-gbc909324bda71543add2229adfa59d8daff5f0db