[Bug target/115406] [15 Regression] wrong code with vector compare at -O0 with -mavx512f since r15-920-gb6c6d5abf0d31c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115406 --- Comment #9 from GCC Commits --- The trunk branch has been updated by Richard Sandiford : https://gcc.gnu.org/g:348d890c287a7ec4c88d3082ae6105537bd39398 commit r15-2161-g348d890c287a7ec4c88d3082ae6105537bd39398 Author: Richard Sandiford Date: Fri Jul 19 19:09:37 2024 +0100 Treat boolean vector elements as 0/-1 [PR115406] Previously we built vector boolean constants using 1 for true elements and 0 for false elements. This matches the predicates produced by SVE's PTRUE instruction, but leads to a miscompilation on AVX, where all bits of a boolean element should be set. One option for RTL would be to make this target-configurable. But that isn't really possible at the tree level, where vectors should work in a more target-independent way. (There is currently no way to create a "generic" packed boolean vector, but never say never :)) And, if we were going to pick a generic behaviour, it would make sense to use 0/-1 rather than 0/1, for consistency with integer vectors. Both behaviours should work with SVE on read, since SVE ignores the upper bits in each predicate element. And the choice shouldn't make much difference for RTL, since all SVE predicate modes are expressed as vectors of BI, rather than of multi-bit booleans. I suspect there might be some fallout from this change on SVE. But I think we should at least give it a go, and see whether any fallout provides a strong counterargument against the approach. gcc/ PR middle-end/115406 * fold-const.cc (native_encode_vector_part): For vector booleans, check whether an element is nonzero and, if so, set all of the correspending bits in the target image. * simplify-rtx.cc (native_encode_rtx): Likewise. gcc/testsuite/ PR middle-end/115406 * gcc.dg/torture/pr115406.c: New test.
[Bug target/115406] [15 Regression] wrong code with vector compare at -O0 with -mavx512f since r15-920-gb6c6d5abf0d31c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115406 Richard Sandiford changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #10 from Richard Sandiford --- Fixed. Although the problem is latent on branches, I think we should only consider a backport after a few months, to give plenty of time for fallout to be detected.
[Bug target/115406] [15 Regression] wrong code with vector compare at -O0 with -mavx512f since r15-920-gb6c6d5abf0d31c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115406 Richard Biener changed: What|Removed |Added Last reconfirmed|2024-06-10 00:00:00 |2024-7-18 CC||rsandifo at gcc dot gnu.org --- Comment #8 from Richard Biener --- IIRC Richard S. had a patch to native_encode_vector_part to fix this by making it set all bits instead of just the LSB?
[Bug target/115406] [15 Regression] wrong code with vector compare at -O0 with -mavx512f since r15-920-gb6c6d5abf0d31c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115406 --- Comment #7 from Hongtao Liu --- > > BTW, when assign -1 to vector(1) , should the upper bit be > cleared? Look like only 1 element boolean vector is cleared, but not > vector(2) . > If the upper bits are not cleared, both 2 cases are equal. diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc index 710d697c021..0f045f851d1 100644 --- a/gcc/fold-const.cc +++ b/gcc/fold-const.cc @@ -8077,7 +8077,7 @@ native_encode_vector_part (const_tree expr, unsigned char *ptr, int len, { tree itype = TREE_TYPE (TREE_TYPE (expr)); if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr)) - && TYPE_PRECISION (itype) <= BITS_PER_UNIT) + && TYPE_PRECISION (itype) < BITS_PER_UNIT) { /* This is the only case in which elements can be smaller than a byte. Element 0 is always in the lsb of the containing byte. */ Can fix this. It looks like it supposed to handle for itype *less than* but not *less equal* BITS_PER_UNIT?
[Bug target/115406] [15 Regression] wrong code with vector compare at -O0 with -mavx512f since r15-920-gb6c6d5abf0d31c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115406 --- Comment #6 from Hongtao Liu --- For 1 element vector, when backend doesn't support it's vector mode, the scalar mode is used for the type, which makes expand_vec_cond_expr_p use QImode for icode check.(vcond_mask_qiqi) It could also be the case when both data type and cmp_type are vector_boolean_type. It looks like vcond_mask_qiqi is dichotomous. For the former, it should be operands[3] == 1 ? operands[1] : operands[2] since mask is vector 1 boolean. For the latter, it should be (operand[1] & operand[3]) | (operand[2] & ~operand[3]) BTW, when assign -1 to vector(1) , should the upper bit be cleared? Look like only 1 element boolean vector is cleared, but not vector(2) . If the upper bits are not cleared, both 2 cases are equal.
[Bug target/115406] [15 Regression] wrong code with vector compare at -O0 with -mavx512f since r15-920-gb6c6d5abf0d31c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115406 --- Comment #5 from Hongtao Liu --- > _2 = VEC_COND_EXPR <_1, { -1 }, { 0 }>; Hmm, it should check vcond_mask_qiv1qi instead of vcond_mask_qiqi, I guess since the backend doesn't supports v1qi, TYPE_MODE of V is QImode, then it wrongly checked vcond_mask_qiqi.
[Bug target/115406] [15 Regression] wrong code with vector compare at -O0 with -mavx512f since r15-920-gb6c6d5abf0d31c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115406 --- Comment #4 from Hongtao Liu --- > > and for _2 = VIEW_CONVERT_EXPR(_1); we explicitly > clear the upper bits due to PR113576, and then we get 1 hit the abort. It's not VIEW_CONVERT_EXPR clear the uppper bits, but _1 = { -1 };
[Bug target/115406] [15 Regression] wrong code with vector compare at -O0 with -mavx512f since r15-920-gb6c6d5abf0d31c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115406 --- Comment #3 from Hongtao Liu --- typedef __attribute__((__vector_size__ (1))) char V; char foo (V v) { return ((V) v == v)[0]; } int main () { char x = foo ((V) { }); if (x != -1) __builtin_abort (); } w/ vcond_mask_qiqi, it's not lowered by veclower, and we get char foo (V v) { vector(1) signed char D.5142; char D.5141; vector(1) _1; vector(1) signed char _2; char _5; : _1 = { -1 }; _2 = VEC_COND_EXPR <_1, { -1 }, { 0 }>; D.5142 = _2; _5 = VIEW_CONVERT_EXPR(D.5142); : : return _5; } But it's further simplified to char foo (V v) { vector(1) signed char D.3765; char D.3764; vector(1) _1; vector(1) signed char _2; char _5; : _1 = { -1 }; _2 = VIEW_CONVERT_EXPR(_1); D.3765 = _2; _5 = VIEW_CONVERT_EXPR(D.3765); : : return _5; } by isel and for _2 = VIEW_CONVERT_EXPR(_1); we explicitly clear the upper bits due to PR113576, and then we get 1 hit the abort. It sound to me _1 = { -1 }; _2 = VEC_COND_EXPR <_1, { -1 }, { 0 }>; shouldn't be simplified to _2 = VIEW_CONVERT_EXPR(_1); when nunits is less than mode precision since the upper bit will be cleared.
[Bug target/115406] [15 Regression] wrong code with vector compare at -O0 with -mavx512f since r15-920-gb6c6d5abf0d31c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115406 Hongtao Liu changed: What|Removed |Added Status|NEW |ASSIGNED --- Comment #2 from Hongtao Liu --- I'll take a look.
[Bug target/115406] [15 Regression] wrong code with vector compare at -O0 with -mavx512f since r15-920-gb6c6d5abf0d31c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115406 Sam James changed: What|Removed |Added Summary|[15 Regression] wrong code |[15 Regression] wrong code |with vector compare at -O0 |with vector compare at -O0 |with -mavx512f |with -mavx512f since ||r15-920-gb6c6d5abf0d31c Last reconfirmed||2024-06-10 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 CC||liuhongt at gcc dot gnu.org --- Comment #1 from Sam James --- r15-920-gb6c6d5abf0d31c