[Bug tree-optimization/116120] New: Wrong code for (a ? x : y) != (b ? x : y)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116120 Bug ID: 116120 Summary: Wrong code for (a ? x : y) != (b ? x : y) Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- GCC is miscompiling the functions in g++.dg/tree-ssa/pr50.C, such as: typedef int v4si __attribute((__vector_size__(4 * sizeof(int; v4si f1_(v4si a, v4si b, v4si c, v4si d, v4si e, v4si f) { v4si X = a == b ? e : f; v4si Y = c == d ? e : f; return (X != Y); } The reason is that PR50 implemented match patterns of the form: (a ? x : y) != (b ? x : y) --> (a^b) ? TRUE : FALSE But this optimization is not correct -- the optimized code gives us a different result for: a = TRUE b = FALSE x = 0 y = 0
[Bug tree-optimization/114090] New: forwprop -fwrapv miscompilation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114090 Bug ID: 114090 Summary: forwprop -fwrapv miscompilation Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The function f below returns an incorrect result for INT_MIN when compiled with -O1 -fwrapv for X86_64: __attribute__((noipa)) int f(int x) { int w = (x >= 0 ? x : 0); int y = -x; int z = (y >= 0 ? y : 0); return w + z; } int main () { if (f(0x8000) != 0) __builtin_abort (); return 0; } What is happening is that forwprop has optimized w_2 = MAX_EXPR ; y_3 = -x_1(D); z_4 = MAX_EXPR ; _5 = w_2 + z_4; return _5; to _5 = ABS_EXPR ; return _5;
[Bug tree-optimization/114056] New: ifcvt may introduce use of uninitialized variables
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114056 Bug ID: 114056 Summary: ifcvt may introduce use of uninitialized variables Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The ifcvt pass may make the code more UB by doing operations on uninitialized variables, which can be seen by compiling the following (from gcc.c-torture/compile/pr80422.c) with -O2 for X86_64: int a, c, f; short b, d, e; int fn1 (int h) { return a > 2 || h > a ? h : h << a; } void fn2 () { int j, k; while (1) { k = c && b; f &= e > (fn1 (k) && j); if (!d) break; } } What is happening here is that .LOOP_VECTORIZED (1, 2) != 0 branches to bb 16 with _17 uninitialized, which is then used in some calculations: _34 = .LOOP_VECTORIZED (2, 3); if (_34 != 0) goto ; [100.00%] else goto ; [100.00%] [local count: 77953654]: [local count: 708669600]: # _13 = PHI <_24(27), _17(D)(45)> _18 = _13 <= 0; _14 = _9 & _18; _27 = _13 > 0; _28 = _9 & _27; _29 = _13 < -29020049; _30 = ~_29; _31 = _14 & _30; _12 = _15 ? _3 : _13; _42 = (unsigned int) _12; _43 = _42 * 4294967222; _32 = _15 | _28; _33 = _31 | _32; _23 = _33 ? _43 : 4294967222; _24 = _33 ? _12 : _13; if (x_6(D) > _23) goto ; [11.00%] else goto ; [89.00%] This does not affect the result, but the discussion about the semantics of uninitialized variables on the mailing list a while back concluded that operations on uninitialized data is UB (with a few exceptions related to moving data...).
[Bug tree-optimization/114032] New: ifcvt may introduce UB calls to __builtin_clz(0)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114032 Bug ID: 114032 Summary: ifcvt may introduce UB calls to __builtin_clz(0) Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The ifcvt pass may make the code more UB, which can be seen by compiling the following function with -O3 for X86_64: int a, b, i; int scaleValueSaturate(int value) { if (value) { int result = __builtin_clz(value); if (-result <= a) return 0; } return b; } short dst; short *src; void scaleValuesSaturate() { for (; i; i++) dst = scaleValueSaturate(src[i]); } What is happening here is that the code for .LOOP_VECTORIZED (1, 2) != 0 always calls __builtin_clz, even when value is 0: [local count: 955630224]: # i.5_21 = PHI <_7(9), i.5_20(24)> _2 = (long unsigned int) i.5_21; _3 = _2 * 2; _4 = src.2_1 + _3; _5 = *_4; value.0_11 = (unsigned int) _5; result_14 = __builtin_clz (value.0_11); _47 = (unsigned int) result_14; _48 = -_47; _15 = (int) _48; _23 = _5 != 0; _28 = _15 <= a.1_16; _46 = _23 & _28; prephitmp_31 = _46 ? 0 : _30; dst = prephitmp_31; _7 = i.5_21 + 1; i = _7; if (_7 != 0) goto ; [89.00%] else goto ; [11.00%]
[Bug tree-optimization/113703] ivopts miscompiles loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113703 --- Comment #3 from Krister Walfridsson --- Oops. I messed up the test case... It "works", but the actual values does not make sense... The following is better: int main() { long pgsz = sysconf (_SC_PAGESIZE); void *p = mmap (NULL, pgsz * 2, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE, 0, 0); if (p == MAP_FAILED) return 0; mprotect (p+pgsz, pgsz, PROT_NONE); uintptr_t n = -2 - (uintptr_t)(p+pgsz); f1 (p+pgsz, -2, n); return 0; }
[Bug tree-optimization/113703] ivopts miscompiles loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113703 --- Comment #2 from Krister Walfridsson --- Here is a runtime testcase: #include #include #include __attribute__((noipa)) void f1 (char *p, uintptr_t i, uintptr_t n) { p += i; do { *p = '\0'; p += 1; i++; } while (i < n); } int main() { long pgsz = sysconf (_SC_PAGESIZE); void *p = mmap (NULL, pgsz * 2, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE, 0, 0); if (p == MAP_FAILED) return 0; mprotect (p+pgsz, pgsz, PROT_NONE); uintptr_t n = -3 - (uintptr_t)p; f1 (p+2, -2, n); return 0; }
[Bug tree-optimization/113703] New: ivopts miscompiles loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113703 Bug ID: 113703 Summary: ivopts miscompiles loop Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The following function (gcc.dg/tree-ssa/ivopts-lt.c) is miscompiled when compiled with with -O1 for X86_64: #include "stdint.h" void f1 (char *p, uintptr_t i, uintptr_t n) { p += i; do { *p = '\0'; p += 1; i++; } while (i < n); } The IR after cunroll looks like: void f1 (char * p, uintptr_t i, uintptr_t n) { : p_6 = p_4(D) + i_5(D); : # p_1 = PHI # i_2 = PHI *p_1 = 0; p_9 = p_1 + 1; i_10 = i_2 + 1; if (i_10 < n_11(D)) goto ; else goto ; : goto ; : return; } This is then changed by ivopts to void f1 (char * p, uintptr_t i, uintptr_t n) { sizetype _13; char * _14; : p_6 = p_4(D) + i_5(D); _13 = n_11(D) - i_5(D); _14 = p_6 + _13; : # p_1 = PHI MEM[(char *)p_1] = 0; p_9 = p_1 + 1; if (p_9 < _14) goto ; else goto ; : goto ; : return; } Suppose the function gets called with the values: p = 0x0002 i = 0x0001 n = 0xdffd7fff The original function writes 0 to address 0x0002, and then exits. The optimized function overflows when calculating _14, and the function does the equivalent of memset(0x0002, 0, 0xdffe7ffe);
[Bug tree-optimization/113630] New: -fno-strict-aliasing introduces out-of-bounds memory access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113630 Bug ID: 113630 Summary: -fno-strict-aliasing introduces out-of-bounds memory access Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The test gcc.dg/torture/pr110799.c crashes because of an out of bounds memory access when compiled with "-O2 -fno-strict-aliasing". What is happening is that the pre pass has changed struct S { int a; }; struct M { int a, b; }; __attribute__((noipa, noinline, noclone, no_icf)) int f (struct S * p, int c, int d) { int r; : if (c_2(D) != 0) goto ; else goto ; : if (d_6(D) != 0) goto ; else goto ; r_8 = p_4(D)->a; goto ; r_7 = MEM[(struct M *)p_4(D)].a; goto ; r_5 = MEM[(struct M *)p_4(D)].b; # r_1 = PHI return r_1; } by combining bb 4 and bb 5 and doing all accesses as struct M: __attribute__((noipa, noinline, noclone, no_icf)) int f (struct S * p, int c, int d) { int r; int pretmp_9; : if (c_2(D) != 0) goto ; [50.00%] else goto ; [50.00%] : pretmp_9 = MEM[(struct M *)p_4(D)].a; goto ; : r_5 = MEM[(struct M *)p_4(D)].b; : # r_1 = PHI return r_1; } This in turn allows later passes to hoist the two loads __attribute__((noipa, noinline, noclone, no_icf)) int f (struct S * p, int c, int d) { int r; int pretmp_9; : pretmp_9 = MEM[(struct M *)p_4(D)].a; r_5 = MEM[(struct M *)p_4(D)].b; if (c_2(D) != 0) goto ; else goto ; : : # r_1 = PHI return r_1; } which now reads out of bounds when we pass a struct S as f(, 1, 1).
[Bug tree-optimization/113590] New: The vectorizer introduces signed overflow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113590 Bug ID: 113590 Summary: The vectorizer introduces signed overflow Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The vectorizer introduces new signed overflow in the function below when compiled with -O3 for x86_64: __attribute__ ((noinline)) int liveloop (int start, int n, int *x, int *y) { int i = start; int j; int ret; for (j = 0; j < n; ++j) { i += 1; x[j] = i; ret = y[j]; } return ret; } The vectorized loop looks like: [local count: 860067200]: # vect_vec_iv_.9_57 = PHI <_58(6), _55(9)> # vectp_x.11_61 = PHI # ivtmp_64 = PHI _58 = vect_vec_iv_.9_57 + { 4, 4, 4, 4 }; vect_i_13.10_60 = vect_vec_iv_.9_57 + { 1, 1, 1, 1 }; MEM [(int *)vectp_x.11_61] = vect_i_13.10_60; vectp_x.11_62 = vectp_x.11_61 + 16; ivtmp_65 = ivtmp_64 + 1; if (ivtmp_65 < bnd.5_47) goto ; [89.00%] else goto ; [11.00%] [local count: 765459809]: goto ; [100.00%] The problem arises from _58, which may overflow in the last iteration. For example, if the function is called as liveloop(0x7ff1, 12, p, q);
[Bug tree-optimization/113588] New: The vectorizer is introducing out-of-bounds memory access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113588 Bug ID: 113588 Summary: The vectorizer is introducing out-of-bounds memory access Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The following function is miscompiled for x86_64 when compiled with -O3 -march=x86-64-v2 unsigned long foo (const char *s, unsigned long n) { unsigned long len = 0; while (*s++ && n--) ++len; return len; } The original function reads two bytes from 's' when called as: char a[4]; a[0] = 1; a[1] = 0; foo(a, 1000); However, the vectorized function reads 16 bytes (thereby accessing the buffer out of bounds) as it reads one vector at a time when s[0] != 0 and n >= 16.
[Bug tree-optimization/113424] lim fails to notice possible aliasing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113424 Krister Walfridsson changed: What|Removed |Added Resolution|INVALID |FIXED --- Comment #4 from Krister Walfridsson --- That makes sense. And it means the check for local variables I have implemented in smtgcc need some improvements... Anyway, to answer the question from comment 2 (which I guess is irrelevant now): the code is a slightly modified g++.dg/opt/pr80436.C which smtgcc claimed was miscompiled because of this issue.
[Bug tree-optimization/113424] New: lim fails to notice possible aliasing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113424 Bug ID: 113424 Summary: lim fails to notice possible aliasing Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The lim pass miscompiles the following C++ program when compiled as -O3 for x86_64 (note: it works as intended when compiled as a C program) struct { char elt1; char bits; } *a; char bar (char *x, char b) { if (0) next_bit: return 1; while (1) { if (b) if (a->bits) goto next_bit; *x = b; if (a->elt1) return 0; a = 0; } } The loop lim gets as input looks as following if (b_9(D) != 0) goto ; else goto ; a.0_1 = a; _2 = a.0_1->bits; if (_2 != 0) goto ; else goto ; *x_10(D) = b_9(D); a.1_3 = a; _4 = a.1_3->elt1; if (_4 != 0) goto ; [5.50%] else goto ; [94.50%] a = 0B; goto ; [100.00%] The lim pass changes this to load `a` before the loop and uses the same value of `a` for both accesses in bb4 and bb5, which is not correct as the store `*x_10(D)` may have modified `a` before the access in bb5.
[Bug tree-optimization/112949] evrp produces incorrect range for __builtin_clz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112949 --- Comment #3 from Krister Walfridsson --- The C program is obviously UB. But the optimization is done on GIMPLE, and it is not obvious to me that the GIMPLE code is UB -- we have a function called __builtin_clz that calls an internal function, so they are different...
[Bug tree-optimization/112949] New: evrp produces incorrect range for __builtin_clz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112949 Bug ID: 112949 Summary: evrp produces incorrect range for __builtin_clz Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The evrp pass generates incorrect ranges for __builtin_clz when it is called within a function named __builtin_clz. While calling it in this manner seems questionable, two relatively recent tests in the testsuite (gcc.dg/pr100521.c and gcc.dg/pr100790.c) suggest that gcc should handle this. The test case gcc.dg/pr100790.c is as follows: __builtin_clz(int x) { x ? __builtin_clz(x) : 32; } Compiling this for x86_64 using -O3 -fpermissive results in the evrp IR: Global Exported: iftmp.0_3 = [irange] int [1, 31] __attribute__((nothrow, leaf, const)) int __builtin_clz (int x) { int iftmp.0_3; : if (x_1(D) != 0) goto ; [INV] else goto ; [INV] : iftmp.0_3 = __builtin_clz (x_1(D)); : return; } The range for iftmp.0_3 (which is an internal call to CFN_BUILT_IN_CLZ) should be [0, 31], not [1, 31].
[Bug tree-optimization/111668] [12/13 Regression] vrp2 (match and simplify) introduces invalid wide signed Boolean values
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111668 --- Comment #9 from Krister Walfridsson --- I opened PR 112738 for the issue mentioned in comment 8.
[Bug tree-optimization/112738] New: forwprop4 introduces invalid wide signed Boolean values
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112738 Bug ID: 112738 Summary: forwprop4 introduces invalid wide signed Boolean values Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The forwprop4 pass introduces an invalid wide Boolean when compiling the following function with -O3 for X86_64: int *a, b, c, d; void foo (void) { for (; d <= 0; d++) b &= ((a || d) ^ c) == 1; } What is happening is that forwprop4 changes the IR _38 = (signed int) _16; _59 = -_38; _65 = () _59; to the incorrect _55 = () _16; _65 = -_55;
[Bug tree-optimization/112736] New: vectorizer is introducing out of bounds memory access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112736 Bug ID: 112736 Summary: vectorizer is introducing out of bounds memory access Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The following function (from gcc.dg/torture/pr68379.c) int a, b[3], c[3][5]; void fn1 () { int e; for (a = 2; a >= 0; a--) for (e = 0; e < 4; e++) c[a][e] = b[a]; } generates out of bound memory access (where the three movdqu instructions read 1, 2, and 3 elements before b) when compiled as -O3 for x86_64: fn1: movdqu b-4(%rip), %xmm1 movdqu b-8(%rip), %xmm2 movl$-1, a(%rip) movdqu b-12(%rip), %xmm3 pshufd $255, %xmm1, %xmm0 movups %xmm0, c+40(%rip) pshufd $255, %xmm2, %xmm0 movups %xmm0, c+20(%rip) pshufd $255, %xmm3, %xmm0 movaps %xmm0, c(%rip) ret The vector operations were introduced by the "vect" pass.
[Bug tree-optimization/111668] [12/13 Regression] vrp2 (match and simplify) introduces invalid wide signed Boolean values
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111668 --- Comment #8 from Krister Walfridsson --- I still see negation of a wide signed Boolean in the IR for this function. But now it is forwprop4 that changes _38 = (signed int) _16; _43 = -_38; _66 = () _43; to _56 = () _16; _66 = -_56;
[Bug tree-optimization/111668] New: vrp2 introduces invalid wide Boolean values
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111668 Bug ID: 111668 Summary: vrp2 introduces invalid wide Boolean values Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The vrp2 pass introduces an invalid wide Boolean when compiling the function int *a, b, c, d; void foo (void) { for (; d <= 0; d++) b &= ((a || d) ^ c) == 1; } What is happening is that vrp2 changes the IR _Bool _16; _66; gimple_assign to the incorrect _Bool _16; _38; _66; gimple_assign gimple_assign
[Bug analyzer/104940] RFE: integrate analyzer with an SMT solver
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104940 Krister Walfridsson changed: What|Removed |Added CC||kristerw at gcc dot gnu.org --- Comment #7 from Krister Walfridsson --- I have released a new version of my tool doing GIMPLE IR to SMT conversion. This is now written in C++, and converts a bigger subset of GIMPLE. The code is available at https://github.com/kristerw/smtgcc
[Bug tree-optimization/111494] New: Signed overflow introduced by vectorizer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111494 Bug ID: 111494 Summary: Signed overflow introduced by vectorizer Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The vectorizer changes the order of additions when vectorizing the loop below, but it is not changing the arithmetic to be unsigned, so it introduces new signed overflows that were not in the original program. int a[32]; int foo(int n) { int sum = 0; for (int i = 0; i < n; i++) sum += a[i]; return sum; }
[Bug tree-optimization/111280] New: CLZ(0) generated when CLZ_DEFINED_VALUE_AT_ZERO is false
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111280 Bug ID: 111280 Summary: CLZ(0) generated when CLZ_DEFINED_VALUE_AT_ZERO is false Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- GCC may generate an internal call to CLZ with 0 when CLZ_DEFINED_VALUE_AT_ZERO is false, which can be seen with gcc.c-torture/execute/920501-6.c where sccp changes a loop to _36 = t_10(D) != 0; _35 = .CLZ (t_10(D)); _34 = 63 - _35; _33 = (unsigned int) _34; _32 = (long long unsigned int) _33; _31 = _32 + 1; b_38 = _36 ? _31 : 1; The value _35 is not used when t_10(D) is 0, so it may be reasonable to allow this. But the value _35 may then be any value, so _34 may overflow. I.e., the calculation _34 = 63 - _35; must be changed to be done unsigned. And the ranges calculated during the dom3 pass claims that _35 has a range _35 : [irange] int [0, 63] which also is wrong.
[Bug tree-optimization/111257] New: new signed overflow after vectorizer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111257 Bug ID: 111257 Summary: new signed overflow after vectorizer Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The vectorizer is not removing the original scalar calculations, and they may overflow after vectorization. This can be seen with int a[8]; void foo(void) { for (int i = 0; i < 8; i++) a[i] = a[i] + 5; } The IR for the loop before vectorization looks like [local count: 954449104]: # i_10 = PHI # ivtmp_4 = PHI _1 = a[i_10]; _2 = _1 + 5; a[i_10] = _2; i_7 = i_10 + 1; ivtmp_3 = ivtmp_4 - 1; if (ivtmp_3 != 0) goto ; [87.50%] else goto ; [12.50%] [local count: 835156385]: goto ; [100.00%] and it is vectorized to [local count: 238585440]: # i_10 = PHI # ivtmp_4 = PHI # vectp_a.4_9 = PHI # vectp_a.8_16 = PHI # ivtmp_19 = PHI vect__1.6_13 = MEM [(int *)vectp_a.4_9]; _1 = a[i_10]; vect__2.7_15 = vect__1.6_13 + { 5, 5, 5, 5 }; _2 = _1 + 5; MEM [(int *)vectp_a.8_16] = vect__2.7_15; i_7 = i_10 + 1; ivtmp_3 = ivtmp_4 - 1; vectp_a.4_8 = vectp_a.4_9 + 16; vectp_a.8_17 = vectp_a.8_16 + 16; ivtmp_20 = ivtmp_19 + 1; if (ivtmp_20 < 2) goto ; [50.00%] else goto ; [50.00%] [local count: 119292723]: goto ; [100.00%] This vectorized loop still read _1 from a[i_10] and adds 5 to it, so the second loop iteration will add 5 to the value of a[1]. But the first iteration has already added 5 to a[1], so we are now doing a different calculation compared to the original loop, and this can overflow even if the original did not.
[Bug tree-optimization/106884] ifcombine may move shift so it shifts more than bitwidth
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106884 --- Comment #6 from Krister Walfridsson --- One more similar case (that may be the same as comment #3): int g; void foo(int a, int b, int c, int d, int e) { if ((10 + a) * b) { g = (c || (g >> d)) << 1; } } In this case, reassoc1 optimizes the IR for c || (g >> d) to do (c | (g >> d)) != 0 and we are now always doing the shift, even when c is true.
[Bug tree-optimization/110760] slp introduces new overflow arithmetic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110760 --- Comment #3 from Krister Walfridsson --- (In reply to Andrew Pinski from comment #1) > I thought we decided that vector types don't apply the overflow rules and > always just wrap ... That makes sense. But on the other hand, PR 110495 is a similar issue, and that was fixed... And TYPE_OVERFLOW_WRAPS should return true for integer vectors if they always wrap (or is it only valid for scalars? But ANY_INTEGRAL_TYPE_P is careful to handle vectors and complex numbers too, so I thought the ANY_INTEGRAL_TYPE_CHECK in TYPE_OVERFLOW_WRAPS means that it work for vectors too).
[Bug tree-optimization/110760] New: slp introduces new wrapped arithmetic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110760 Bug ID: 110760 Summary: slp introduces new wrapped arithmetic Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- Consider the following function from gcc.dg/vect/bb-slp-layout-5.c: int a[4], b[4], c[4]; void f1() { a[0] = b[3] - c[3]; a[1] = b[2] + c[2]; a[2] = b[1] - c[1]; a[3] = b[0] + c[0]; } This is vectorized by slp2: vector(4) int vect__1.5; vector(4) int vect__2.8; vector(4) int vect__12.10; vector(4) int vect__3.9; vector(4) int _22; vect__1.5_18 = MEM [(int *)]; vect__2.8_19 = MEM [(int *)]; vect__12.10_21 = vect__1.5_18 + vect__2.8_19; vect__3.9_20 = vect__1.5_18 - vect__2.8_19; _22 = VEC_PERM_EXPR ; MEM [(int *)] = _22; But this introduces new calculations in the temporary vectors of the unused elements: b[0] - c[0]; b[1] + c[1]; b[2] - c[2]; b[3] + c[3]; and these calculations may wrap for input where the original program did not wrap.
[Bug tree-optimization/110554] New: more invalid wide Boolean values
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110554 Bug ID: 110554 Summary: more invalid wide Boolean values Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The fix for PR 110487 improved the situation, but my tool still finds some cases where GCC generates invalid values. One such case can be seen in gcc.c-torture/compile/pr104499.c: typedef int __attribute__((__vector_size__ (8 * sizeof (int V; V v; void foo (void) { v = ((1 | v) != 1); } Here veclower2 is introducing code _8; _10; ... gimple_assign gimple_assign More examples of this failure can be seen in gcc.c-torture/compile/pr108237.c and gcc.c-torture/compile/pr54713-1.c
[Bug tree-optimization/110541] New: Invalid VEC_PERM_EXPR mask element size
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110541 Bug ID: 110541 Summary: Invalid VEC_PERM_EXPR mask element size Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- tree.def says: The number of MASK elements must be the same with the number of elements in V0 and V1. The size of the inner type of the MASK and of the V0 and V1 must be the same. But tree-vectorizer creates permutations where the MASK element size is different than for V0 and V1, such as vector(8) unsigned short _79; ... _79 = VEC_PERM_EXPR <_78, { 0, 0, 0, 0, 0, 0, 0, 0 }, { 4, 5, 6, 7, 8, 9, 10, 11 }>; where the MASK elements are of a 64-bit type. This can be seen when compiling the following function (from gcc.c-torture/compile/2717-1.c) as "gcc -S -O3" for x86_64: short inner_product (short *a, short *b) { int i; short sum = 0; for (i = 9; i >= 0; i--) sum += (*a++) * (*b++); return sum; }
[Bug tree-optimization/110495] New: fre introduces signed wrap for vector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110495 Bug ID: 110495 Summary: fre introduces signed wrap for vector Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The following function (from gcc.dg/tree-ssa/addadd-2.c) typedef int S __attribute__((vector_size(64))); void j(S*x){ *x += __INT_MAX__; *x += __INT_MAX__; } is optimized by fre1 to void j (S * x) { vector(16) int _1; vector(16) int _2; vector(16) int _4; : _1 = *x_6(D); _2 = _1 + { 2147483647, 2147483647, 2147483647, 2147483647, 2147483647, 2147483647, 2147483647, 2147483647, 2147483647, 2147483647, 2147483647, 2147483647, 2147483647, 2147483647, 2147483647, 2147483647 }; *x_6(D) = _2; _4 = _1 + { -2(OVF), -2(OVF), -2(OVF), -2(OVF), -2(OVF), -2(OVF), -2(OVF), -2(OVF), -2(OVF), -2(OVF), -2(OVF), -2(OVF), -2(OVF), -2(OVF), -2(OVF), -2(OVF) }; *x_6(D) = _4; return; } which has signed wrap for the cases where the original did not wrap.
[Bug tree-optimization/110487] New: invalid wide Boolean value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110487 Bug ID: 110487 Summary: invalid wide Boolean value Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The vrp2 pass generates IR where a may get the value 1 (in addition to the valid 0 and -1). This can be seen in gcc.c-torture/compile/pr53410-1.c int *a, b, c, d; void foo (void) { for (; d <= 0; d++) b &= ((a || d) ^ c) == 1; } when compiled as "gcc -O3". The vectorizer has created (correct) code _Bool _16; _66; ... _16 = a.1_1 != 0B; _66 = _16 ? -1 : 0; which then is transformed by vrp2 to _Bool _16; _38; _66; ... _16 = a.1_1 != 0B; _38 = () _16; _66 = -_38; _16 can be both true/false depending on the values of some global variables, so _38 has the value 0 or -1, and _66 has the value 0 or 1.
[Bug tree-optimization/110434] New: tree-nrv introduces incorrect CLOBBER(eol)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110434 Bug ID: 110434 Summary: tree-nrv introduces incorrect CLOBBER(eol) Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The tree-nrv pass may introduce incorrect CLOBBER(eol) of the form ={v} {CLOBBER(eol)}; return ; One example of this can be seen by compiling gcc.c-torture/execute/921204-1.c for x86 using the flags "-O -m32", where it changes the IR union bu o; ... o = i; MEM[(union *)].b18 = _11; MEM[(union *)].b20 = _11; = o; o ={v} {CLOBBER(eol)}; return ; to just use instead of o union bu o [value-expr: ]; ... = i; MEM[(union *)&].b18 = _11; MEM[(union *)&].b20 = _11; ={v} {CLOBBER(eol)}; return ; so the CLOBBER(eol) now refers to .
[Bug tree-optimization/109626] New: forwprop introduces new signed multiplication UB
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109626 Bug ID: 109626 Summary: forwprop introduces new signed multiplication UB Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- Consider the function int foo(_Bool v0, unsigned v1, unsigned v2) { signed int v5 = v1 >> v2; unsigned v6 = -v1; unsigned int v7 = v2 - v0; return (int)v7 * (int)v6; } This does not invoke undefined behavior when called as foo(0, 0x8000, 1), but forwprop1 optimizes this to the equivalent of int foo(_Bool v0, unsigned v1, unsigned v2) { signed int v5 = v1 >> v2; unsigned int v7 = v0 - v2; return (int)v7 * (int)v1; } where the signed multiplication now is calculating -1 * INT_MIN.
[Bug tree-optimization/108625] New: forwprop introduces new UB
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108625 Bug ID: 108625 Summary: forwprop introduces new UB Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- Consider the function unsigned char foo(int x) { int t = -x; unsigned char t1 = t; unsigned char t2 = t; return t1 + t2; } This does not invoke undefined behavior when called as foo(0x4001), but forwprop1 optimizes this to unsigned char foo (int x) { int t; unsigned char _5; int _7; : t_2 = -x_1(D); _7 = t_2 - x_1(D); _5 = (unsigned char) _7; return _5; } where _7 has signed overflow for x = 0x4001.
[Bug tree-optimization/108440] rotate optimization may introduce new UB
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108440 --- Comment #4 from Krister Walfridsson --- I misread the comment -- it describes a possible future improvement (that I believe is not allowed). But the committed patch seems to be correct.
[Bug tree-optimization/106523] [10/11/12 Regression] forwprop miscompile
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106523 --- Comment #8 from Krister Walfridsson --- This fixed most of the rotate issues my translation validation tool found. I assume the remaining issues are due to a different (but similar) bug, so I opened Bug 108440 for those. But the issue in Bug 108440 seems similar to the "Y equal to B case" discussed in comment #6, so I believe the comment is slightly wrong (as the rotate instruction will invoke UB when Y is equal to B).
[Bug tree-optimization/108440] rotate optimization may introduce new UB
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108440 --- Comment #3 from Krister Walfridsson --- Hmm. I think this is the "Y equal to B case" from bug 106523. I.e., the bugfix is not correct...
[Bug tree-optimization/108440] rotate optimization may introduce new UB
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108440 --- Comment #2 from Krister Walfridsson --- No, bug 106523 is a different issue (I have tested with a compiler that has that fixed).
[Bug tree-optimization/108440] New: rotate optimization may introduce new UB
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108440 Bug ID: 108440 Summary: rotate optimization may introduce new UB Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- GCC optimizes shift instructions to rotate in a way that may make the optimized IR invoke UB for cases where the original did not. This can be seen in the IR for f5 from c-c++-common/rotate-1.c: unsigned short int f5 (unsigned short int x, unsigned int y) { return (x << y) | (x >> (__CHAR_BIT__ * __SIZEOF_SHORT__ - y)); } The IR is doing 32-bit shifts, so y = 16 does not invoke UB: short unsigned int f5 (short unsigned int x, unsigned int y) { int _1; int _2; signed short _3; int _4; unsigned int _5; int _6; signed short _7; signed short _8; short unsigned int _11; : _1 = (int) x_9(D); _2 = _1 << y_10(D); _3 = (signed short) _2; _4 = (int) x_9(D); _5 = 16 - y_10(D); _6 = _4 >> _5; _7 = (signed short) _6; _8 = _3 | _7; _11 = (short unsigned int) _8; return _11; } But forwprop1 changes this to a 16-bit rotate which invokes UB for y=16: short unsigned int f5 (short unsigned int x, unsigned int y) { short unsigned int _13; : _13 = x_9(D) r<< y_10(D); return _13; }
[Bug tree-optimization/106884] ifcombine may move shift so it shifts more than bitwidth
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106884 --- Comment #3 from Krister Walfridsson --- A similar case is int r1, r2; int foo(int a, int s1, int s2) { if (a & (1 << s1)) return r1; if (a & (1 << s2)) return r1; return r2; } where reassoc2 optimizes this to always shift by s2.
[Bug tree-optimization/106990] New: Missing TYPE_OVERFLOW_SANITIZED checks in match.pd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106990 Bug ID: 106990 Summary: Missing TYPE_OVERFLOW_SANITIZED checks in match.pd Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- When UBSan is used, match.pd disables simplifications that can remove UB. But two simplifications are missing TYPE_OVERFLOW_SANITIZED checks, making the two tests below fail to report UB when compiled with -fsanitize=undefined. /* (~X - ~Y) -> Y - X. */ int main(void) { volatile int x = -1956816001; volatile int y = 1999200512; return ~x - ~y; } /* -x & 1 -> x & 1. */ int main(void) { volatile int x = 0x8000; return -x & 1; }
[Bug tree-optimization/106884] ifcombine may move shift so it shifts more than bitwidth
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106884 --- Comment #2 from Krister Walfridsson --- This optimization is invalid if (int)1 << 33 is _not_ undefined behavior in GIMPLE! Consider an architecture where (int)1 << 33 evaluates to 0. foo(2, 1, 33) evaluates to 0 for the original GIMPLE, but it evaluates to 2 in the optimized IR.
[Bug sanitizer/106885] New: -(a-b) is folded to b-a before the UBSAN pass is run
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106885 Bug ID: 106885 Summary: -(a-b) is folded to b-a before the UBSAN pass is run Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: sanitizer Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org, jakub at gcc dot gnu.org, kcc at gcc dot gnu.org, marxin at gcc dot gnu.org Target Milestone: --- GCC is folding -(a-b) to b-a before the UBSAN pass is run, which may hide undefined behavior from the sanitizer. This can be seen by the following program, which invokes undefined behavior that is not detected by -fsanitize=undefined int main(void) { volatile int a = 0; volatile int b = 0x8000; return -(a - b); }
[Bug tree-optimization/106884] New: ifcombine may move shift so it shifts more than bitwidth
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106884 Bug ID: 106884 Summary: ifcombine may move shift so it shifts more than bitwidth Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The function foo from gcc.dg/tree-ssa/ssa-ifcombine-1.c can be called as foo(1, 1, 33) without invoking undefined behavior int foo (int x, int a, int b) { int c = 1 << a; if (x & c) if (x & (1 << b)) return 2; return 0; } But ifcombine transforms this to int foo (int x, int a, int b) { int c; int _4; int _10; int _11; int _12; int _13; : _10 = 1 << b_8(D); _11 = 1 << a_5(D); _12 = _10 | _11; _13 = x_7(D) & _12; if (_12 == _13) goto ; else goto ; : : # _4 = PHI <2(3), 0(2)> return _4; } and this will now calculate 1 << 33 unconditionally for _10.
[Bug tree-optimization/106883] New: SLSR may generate signed wrap
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106883 Bug ID: 106883 Summary: SLSR may generate signed wrap Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- SLSR may generate new signed wrap for cases where the original did not wrap. This can be seen in the function f from gcc.dg/tree-ssa/slsr-19.c: int f (int c, int s) { int x1, x2, y1, y2; y1 = c + 2; x1 = s * y1; y2 = y1 + 2; x2 = s * y2; return x1 + x2; } SLSR optimizes this to int f (int c, int s) { int y1; int x2; int x1; int _7; int slsr_9; : y1_2 = c_1(D) + 2; x1_4 = y1_2 * s_3(D); slsr_9 = s_3(D) * 2; x2_6 = x1_4 + slsr_9; _7 = x1_4 + x2_6; return _7; Calling f(-3, 0x75181005) does not make any operation wrap in the original function, but slsr_9 overflow in the optimized code.
[Bug tree-optimization/106744] New: phiopt miscompiles min/max
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106744 Bug ID: 106744 Summary: phiopt miscompiles min/max Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- GCC miscompiles the following test at -O1 or higher optimization levels: #include __attribute__((noinline)) uint8_t three_minmax1 (uint8_t xc, uint8_t xm, uint8_t xy) { uint8_t xk; if (xc > xm) { xk = (uint8_t) (xc < xy ? xc : xy); } else { xk = (uint8_t) (xm < xy ? xm : xy); } return xk; } int main (void) { volatile uint8_t xy = 255; volatile uint8_t xm = 0; volatile uint8_t xc = 255; if (three_minmax1 (xc, xm, xy) != 255) __builtin_abort (); return 0; } What is happening is that phiopt transforms three_minmax1 to _7 = MAX_EXPR ; _9 = MIN_EXPR <_7, xm_3(D)>; return _9; instead of the intended _7 = MAX_EXPR ; _9 = MIN_EXPR <_7, xy_4(D)>; return _9;
[Bug tree-optimization/106523] New: forwprop miscompile
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106523 Bug ID: 106523 Summary: forwprop miscompile Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The function f7 from testsuite/c-c++-common/rotate-2.c is miscompiled by forwprop. This can be seen by running the function as __attribute__((noinline)) unsigned char f7 (unsigned char x, unsigned int y) { unsigned int t = x; return (t << y) | (t >> ((-y) & 7)); } int main (void) { volatile unsigned char x = 152; volatile unsigned int y = 19; if (f7(x, y) != 4) __builtin_abort (); return 0; } This fails at -O1 and higher optimization levels. What is happening here is that forwprop1 has optimized the function to _10 = x_7(D) r<< y_9(D); return _10;
[Bug tree-optimization/106513] bswap is incorrectly generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106513 --- Comment #2 from Krister Walfridsson --- (In reply to Andreas Schwab from comment #1) > This subexpression has undefined behaviour: (((int64_t) 0xff) << 56). I thought that was allowed in GCC as the manual says (https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc/Integers-implementation.html#Integers-implementation) "As an extension to the C language, GCC does not use the latitude given in C99 and C11 only to treat certain aspects of signed ‘<<’ as undefined." If not, what behavior does the manual refer to?
[Bug tree-optimization/106513] New: bswap is incorrectly generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106513 Bug ID: 106513 Summary: bswap is incorrectly generated Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- GCC may incorrectly generate bswap instructions for code not doing a correct swap. This can be seen by running the function from testsuite/gcc.dg/pr40501.c as typedef long int int64_t; __attribute__((noinline)) int64_t swap64 (int64_t n) { return (((n & (((int64_t) 0xff) )) << 56) | ((n & (((int64_t) 0xff) << 8)) << 40) | ((n & (((int64_t) 0xff) << 16)) << 24) | ((n & (((int64_t) 0xff) << 24)) << 8) | ((n & (((int64_t) 0xff) << 32)) >> 8) | ((n & (((int64_t) 0xff) << 40)) >> 24) | ((n & (((int64_t) 0xff) << 48)) >> 40) | ((n & (((int64_t) 0xff) << 56)) >> 56)); } int main (void) { volatile int64_t n = 0x8000l; if (swap64(n) != 0xff80l) __builtin_abort (); return 0; } This fails at -Os and higher optimization levels.