[Bug tree-optimization/116120] New: Wrong code for (a ? x : y) != (b ? x : y)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116120 Bug ID: 116120 Summary: Wrong code for (a ? x : y) != (b ? x : y) Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- GCC is miscompiling the functions in g++.dg/tree-ssa/pr50.C, such as: typedef int v4si __attribute((__vector_size__(4 * sizeof(int; v4si f1_(v4si a, v4si b, v4si c, v4si d, v4si e, v4si f) { v4si X = a == b ? e : f; v4si Y = c == d ? e : f; return (X != Y); } The reason is that PR50 implemented match patterns of the form: (a ? x : y) != (b ? x : y) --> (a^b) ? TRUE : FALSE But this optimization is not correct -- the optimized code gives us a different result for: a = TRUE b = FALSE x = 0 y = 0
[Bug tree-optimization/114090] New: forwprop -fwrapv miscompilation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114090 Bug ID: 114090 Summary: forwprop -fwrapv miscompilation Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The function f below returns an incorrect result for INT_MIN when compiled with -O1 -fwrapv for X86_64: __attribute__((noipa)) int f(int x) { int w = (x >= 0 ? x : 0); int y = -x; int z = (y >= 0 ? y : 0); return w + z; } int main () { if (f(0x8000) != 0) __builtin_abort (); return 0; } What is happening is that forwprop has optimized w_2 = MAX_EXPR ; y_3 = -x_1(D); z_4 = MAX_EXPR ; _5 = w_2 + z_4; return _5; to _5 = ABS_EXPR ; return _5;
[Bug tree-optimization/114056] New: ifcvt may introduce use of uninitialized variables
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114056 Bug ID: 114056 Summary: ifcvt may introduce use of uninitialized variables Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The ifcvt pass may make the code more UB by doing operations on uninitialized variables, which can be seen by compiling the following (from gcc.c-torture/compile/pr80422.c) with -O2 for X86_64: int a, c, f; short b, d, e; int fn1 (int h) { return a > 2 || h > a ? h : h << a; } void fn2 () { int j, k; while (1) { k = c && b; f &= e > (fn1 (k) && j); if (!d) break; } } What is happening here is that .LOOP_VECTORIZED (1, 2) != 0 branches to bb 16 with _17 uninitialized, which is then used in some calculations: _34 = .LOOP_VECTORIZED (2, 3); if (_34 != 0) goto ; [100.00%] else goto ; [100.00%] [local count: 77953654]: [local count: 708669600]: # _13 = PHI <_24(27), _17(D)(45)> _18 = _13 <= 0; _14 = _9 & _18; _27 = _13 > 0; _28 = _9 & _27; _29 = _13 < -29020049; _30 = ~_29; _31 = _14 & _30; _12 = _15 ? _3 : _13; _42 = (unsigned int) _12; _43 = _42 * 4294967222; _32 = _15 | _28; _33 = _31 | _32; _23 = _33 ? _43 : 4294967222; _24 = _33 ? _12 : _13; if (x_6(D) > _23) goto ; [11.00%] else goto ; [89.00%] This does not affect the result, but the discussion about the semantics of uninitialized variables on the mailing list a while back concluded that operations on uninitialized data is UB (with a few exceptions related to moving data...).
[Bug tree-optimization/114032] New: ifcvt may introduce UB calls to __builtin_clz(0)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114032 Bug ID: 114032 Summary: ifcvt may introduce UB calls to __builtin_clz(0) Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The ifcvt pass may make the code more UB, which can be seen by compiling the following function with -O3 for X86_64: int a, b, i; int scaleValueSaturate(int value) { if (value) { int result = __builtin_clz(value); if (-result <= a) return 0; } return b; } short dst; short *src; void scaleValuesSaturate() { for (; i; i++) dst = scaleValueSaturate(src[i]); } What is happening here is that the code for .LOOP_VECTORIZED (1, 2) != 0 always calls __builtin_clz, even when value is 0: [local count: 955630224]: # i.5_21 = PHI <_7(9), i.5_20(24)> _2 = (long unsigned int) i.5_21; _3 = _2 * 2; _4 = src.2_1 + _3; _5 = *_4; value.0_11 = (unsigned int) _5; result_14 = __builtin_clz (value.0_11); _47 = (unsigned int) result_14; _48 = -_47; _15 = (int) _48; _23 = _5 != 0; _28 = _15 <= a.1_16; _46 = _23 & _28; prephitmp_31 = _46 ? 0 : _30; dst = prephitmp_31; _7 = i.5_21 + 1; i = _7; if (_7 != 0) goto ; [89.00%] else goto ; [11.00%]
[Bug tree-optimization/113703] ivopts miscompiles loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113703 --- Comment #3 from Krister Walfridsson --- Oops. I messed up the test case... It "works", but the actual values does not make sense... The following is better: int main() { long pgsz = sysconf (_SC_PAGESIZE); void *p = mmap (NULL, pgsz * 2, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE, 0, 0); if (p == MAP_FAILED) return 0; mprotect (p+pgsz, pgsz, PROT_NONE); uintptr_t n = -2 - (uintptr_t)(p+pgsz); f1 (p+pgsz, -2, n); return 0; }
[Bug tree-optimization/113703] ivopts miscompiles loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113703 --- Comment #2 from Krister Walfridsson --- Here is a runtime testcase: #include #include #include __attribute__((noipa)) void f1 (char *p, uintptr_t i, uintptr_t n) { p += i; do { *p = '\0'; p += 1; i++; } while (i < n); } int main() { long pgsz = sysconf (_SC_PAGESIZE); void *p = mmap (NULL, pgsz * 2, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE, 0, 0); if (p == MAP_FAILED) return 0; mprotect (p+pgsz, pgsz, PROT_NONE); uintptr_t n = -3 - (uintptr_t)p; f1 (p+2, -2, n); return 0; }
[Bug tree-optimization/113703] New: ivopts miscompiles loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113703 Bug ID: 113703 Summary: ivopts miscompiles loop Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The following function (gcc.dg/tree-ssa/ivopts-lt.c) is miscompiled when compiled with with -O1 for X86_64: #include "stdint.h" void f1 (char *p, uintptr_t i, uintptr_t n) { p += i; do { *p = '\0'; p += 1; i++; } while (i < n); } The IR after cunroll looks like: void f1 (char * p, uintptr_t i, uintptr_t n) { : p_6 = p_4(D) + i_5(D); : # p_1 = PHI # i_2 = PHI *p_1 = 0; p_9 = p_1 + 1; i_10 = i_2 + 1; if (i_10 < n_11(D)) goto ; else goto ; : goto ; : return; } This is then changed by ivopts to void f1 (char * p, uintptr_t i, uintptr_t n) { sizetype _13; char * _14; : p_6 = p_4(D) + i_5(D); _13 = n_11(D) - i_5(D); _14 = p_6 + _13; : # p_1 = PHI MEM[(char *)p_1] = 0; p_9 = p_1 + 1; if (p_9 < _14) goto ; else goto ; : goto ; : return; } Suppose the function gets called with the values: p = 0x0002 i = 0x0001 n = 0xdffd7fff The original function writes 0 to address 0x0002, and then exits. The optimized function overflows when calculating _14, and the function does the equivalent of memset(0x0002, 0, 0xdffe7ffe);
[Bug tree-optimization/113630] New: -fno-strict-aliasing introduces out-of-bounds memory access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113630 Bug ID: 113630 Summary: -fno-strict-aliasing introduces out-of-bounds memory access Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The test gcc.dg/torture/pr110799.c crashes because of an out of bounds memory access when compiled with "-O2 -fno-strict-aliasing". What is happening is that the pre pass has changed struct S { int a; }; struct M { int a, b; }; __attribute__((noipa, noinline, noclone, no_icf)) int f (struct S * p, int c, int d) { int r; : if (c_2(D) != 0) goto ; else goto ; : if (d_6(D) != 0) goto ; else goto ; r_8 = p_4(D)->a; goto ; r_7 = MEM[(struct M *)p_4(D)].a; goto ; r_5 = MEM[(struct M *)p_4(D)].b; # r_1 = PHI return r_1; } by combining bb 4 and bb 5 and doing all accesses as struct M: __attribute__((noipa, noinline, noclone, no_icf)) int f (struct S * p, int c, int d) { int r; int pretmp_9; : if (c_2(D) != 0) goto ; [50.00%] else goto ; [50.00%] : pretmp_9 = MEM[(struct M *)p_4(D)].a; goto ; : r_5 = MEM[(struct M *)p_4(D)].b; : # r_1 = PHI return r_1; } This in turn allows later passes to hoist the two loads __attribute__((noipa, noinline, noclone, no_icf)) int f (struct S * p, int c, int d) { int r; int pretmp_9; : pretmp_9 = MEM[(struct M *)p_4(D)].a; r_5 = MEM[(struct M *)p_4(D)].b; if (c_2(D) != 0) goto ; else goto ; : : # r_1 = PHI return r_1; } which now reads out of bounds when we pass a struct S as f(, 1, 1).
[Bug tree-optimization/113590] New: The vectorizer introduces signed overflow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113590 Bug ID: 113590 Summary: The vectorizer introduces signed overflow Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The vectorizer introduces new signed overflow in the function below when compiled with -O3 for x86_64: __attribute__ ((noinline)) int liveloop (int start, int n, int *x, int *y) { int i = start; int j; int ret; for (j = 0; j < n; ++j) { i += 1; x[j] = i; ret = y[j]; } return ret; } The vectorized loop looks like: [local count: 860067200]: # vect_vec_iv_.9_57 = PHI <_58(6), _55(9)> # vectp_x.11_61 = PHI # ivtmp_64 = PHI _58 = vect_vec_iv_.9_57 + { 4, 4, 4, 4 }; vect_i_13.10_60 = vect_vec_iv_.9_57 + { 1, 1, 1, 1 }; MEM [(int *)vectp_x.11_61] = vect_i_13.10_60; vectp_x.11_62 = vectp_x.11_61 + 16; ivtmp_65 = ivtmp_64 + 1; if (ivtmp_65 < bnd.5_47) goto ; [89.00%] else goto ; [11.00%] [local count: 765459809]: goto ; [100.00%] The problem arises from _58, which may overflow in the last iteration. For example, if the function is called as liveloop(0x7ff1, 12, p, q);
[Bug tree-optimization/113588] New: The vectorizer is introducing out-of-bounds memory access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113588 Bug ID: 113588 Summary: The vectorizer is introducing out-of-bounds memory access Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The following function is miscompiled for x86_64 when compiled with -O3 -march=x86-64-v2 unsigned long foo (const char *s, unsigned long n) { unsigned long len = 0; while (*s++ && n--) ++len; return len; } The original function reads two bytes from 's' when called as: char a[4]; a[0] = 1; a[1] = 0; foo(a, 1000); However, the vectorized function reads 16 bytes (thereby accessing the buffer out of bounds) as it reads one vector at a time when s[0] != 0 and n >= 16.
[Bug tree-optimization/113424] lim fails to notice possible aliasing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113424 Krister Walfridsson changed: What|Removed |Added Resolution|INVALID |FIXED --- Comment #4 from Krister Walfridsson --- That makes sense. And it means the check for local variables I have implemented in smtgcc need some improvements... Anyway, to answer the question from comment 2 (which I guess is irrelevant now): the code is a slightly modified g++.dg/opt/pr80436.C which smtgcc claimed was miscompiled because of this issue.
[Bug tree-optimization/113424] New: lim fails to notice possible aliasing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113424 Bug ID: 113424 Summary: lim fails to notice possible aliasing Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The lim pass miscompiles the following C++ program when compiled as -O3 for x86_64 (note: it works as intended when compiled as a C program) struct { char elt1; char bits; } *a; char bar (char *x, char b) { if (0) next_bit: return 1; while (1) { if (b) if (a->bits) goto next_bit; *x = b; if (a->elt1) return 0; a = 0; } } The loop lim gets as input looks as following if (b_9(D) != 0) goto ; else goto ; a.0_1 = a; _2 = a.0_1->bits; if (_2 != 0) goto ; else goto ; *x_10(D) = b_9(D); a.1_3 = a; _4 = a.1_3->elt1; if (_4 != 0) goto ; [5.50%] else goto ; [94.50%] a = 0B; goto ; [100.00%] The lim pass changes this to load `a` before the loop and uses the same value of `a` for both accesses in bb4 and bb5, which is not correct as the store `*x_10(D)` may have modified `a` before the access in bb5.
[Bug tree-optimization/112949] evrp produces incorrect range for __builtin_clz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112949 --- Comment #3 from Krister Walfridsson --- The C program is obviously UB. But the optimization is done on GIMPLE, and it is not obvious to me that the GIMPLE code is UB -- we have a function called __builtin_clz that calls an internal function, so they are different...
[Bug tree-optimization/112949] New: evrp produces incorrect range for __builtin_clz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112949 Bug ID: 112949 Summary: evrp produces incorrect range for __builtin_clz Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The evrp pass generates incorrect ranges for __builtin_clz when it is called within a function named __builtin_clz. While calling it in this manner seems questionable, two relatively recent tests in the testsuite (gcc.dg/pr100521.c and gcc.dg/pr100790.c) suggest that gcc should handle this. The test case gcc.dg/pr100790.c is as follows: __builtin_clz(int x) { x ? __builtin_clz(x) : 32; } Compiling this for x86_64 using -O3 -fpermissive results in the evrp IR: Global Exported: iftmp.0_3 = [irange] int [1, 31] __attribute__((nothrow, leaf, const)) int __builtin_clz (int x) { int iftmp.0_3; : if (x_1(D) != 0) goto ; [INV] else goto ; [INV] : iftmp.0_3 = __builtin_clz (x_1(D)); : return; } The range for iftmp.0_3 (which is an internal call to CFN_BUILT_IN_CLZ) should be [0, 31], not [1, 31].
[Bug tree-optimization/111668] [12/13 Regression] vrp2 (match and simplify) introduces invalid wide signed Boolean values
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111668 --- Comment #9 from Krister Walfridsson --- I opened PR 112738 for the issue mentioned in comment 8.
[Bug tree-optimization/112738] New: forwprop4 introduces invalid wide signed Boolean values
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112738 Bug ID: 112738 Summary: forwprop4 introduces invalid wide signed Boolean values Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The forwprop4 pass introduces an invalid wide Boolean when compiling the following function with -O3 for X86_64: int *a, b, c, d; void foo (void) { for (; d <= 0; d++) b &= ((a || d) ^ c) == 1; } What is happening is that forwprop4 changes the IR _38 = (signed int) _16; _59 = -_38; _65 = () _59; to the incorrect _55 = () _16; _65 = -_55;
[Bug tree-optimization/112736] New: vectorizer is introducing out of bounds memory access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112736 Bug ID: 112736 Summary: vectorizer is introducing out of bounds memory access Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The following function (from gcc.dg/torture/pr68379.c) int a, b[3], c[3][5]; void fn1 () { int e; for (a = 2; a >= 0; a--) for (e = 0; e < 4; e++) c[a][e] = b[a]; } generates out of bound memory access (where the three movdqu instructions read 1, 2, and 3 elements before b) when compiled as -O3 for x86_64: fn1: movdqu b-4(%rip), %xmm1 movdqu b-8(%rip), %xmm2 movl$-1, a(%rip) movdqu b-12(%rip), %xmm3 pshufd $255, %xmm1, %xmm0 movups %xmm0, c+40(%rip) pshufd $255, %xmm2, %xmm0 movups %xmm0, c+20(%rip) pshufd $255, %xmm3, %xmm0 movaps %xmm0, c(%rip) ret The vector operations were introduced by the "vect" pass.
[Bug tree-optimization/111668] [12/13 Regression] vrp2 (match and simplify) introduces invalid wide signed Boolean values
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111668 --- Comment #8 from Krister Walfridsson --- I still see negation of a wide signed Boolean in the IR for this function. But now it is forwprop4 that changes _38 = (signed int) _16; _43 = -_38; _66 = () _43; to _56 = () _16; _66 = -_56;
[Bug tree-optimization/111668] New: vrp2 introduces invalid wide Boolean values
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111668 Bug ID: 111668 Summary: vrp2 introduces invalid wide Boolean values Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The vrp2 pass introduces an invalid wide Boolean when compiling the function int *a, b, c, d; void foo (void) { for (; d <= 0; d++) b &= ((a || d) ^ c) == 1; } What is happening is that vrp2 changes the IR _Bool _16; _66; gimple_assign to the incorrect _Bool _16; _38; _66; gimple_assign gimple_assign
[Bug analyzer/104940] RFE: integrate analyzer with an SMT solver
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104940 Krister Walfridsson changed: What|Removed |Added CC||kristerw at gcc dot gnu.org --- Comment #7 from Krister Walfridsson --- I have released a new version of my tool doing GIMPLE IR to SMT conversion. This is now written in C++, and converts a bigger subset of GIMPLE. The code is available at https://github.com/kristerw/smtgcc
[Bug tree-optimization/111494] New: Signed overflow introduced by vectorizer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111494 Bug ID: 111494 Summary: Signed overflow introduced by vectorizer Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The vectorizer changes the order of additions when vectorizing the loop below, but it is not changing the arithmetic to be unsigned, so it introduces new signed overflows that were not in the original program. int a[32]; int foo(int n) { int sum = 0; for (int i = 0; i < n; i++) sum += a[i]; return sum; }
[Bug tree-optimization/111280] New: CLZ(0) generated when CLZ_DEFINED_VALUE_AT_ZERO is false
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111280 Bug ID: 111280 Summary: CLZ(0) generated when CLZ_DEFINED_VALUE_AT_ZERO is false Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- GCC may generate an internal call to CLZ with 0 when CLZ_DEFINED_VALUE_AT_ZERO is false, which can be seen with gcc.c-torture/execute/920501-6.c where sccp changes a loop to _36 = t_10(D) != 0; _35 = .CLZ (t_10(D)); _34 = 63 - _35; _33 = (unsigned int) _34; _32 = (long long unsigned int) _33; _31 = _32 + 1; b_38 = _36 ? _31 : 1; The value _35 is not used when t_10(D) is 0, so it may be reasonable to allow this. But the value _35 may then be any value, so _34 may overflow. I.e., the calculation _34 = 63 - _35; must be changed to be done unsigned. And the ranges calculated during the dom3 pass claims that _35 has a range _35 : [irange] int [0, 63] which also is wrong.
[Bug tree-optimization/111257] New: new signed overflow after vectorizer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111257 Bug ID: 111257 Summary: new signed overflow after vectorizer Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The vectorizer is not removing the original scalar calculations, and they may overflow after vectorization. This can be seen with int a[8]; void foo(void) { for (int i = 0; i < 8; i++) a[i] = a[i] + 5; } The IR for the loop before vectorization looks like [local count: 954449104]: # i_10 = PHI # ivtmp_4 = PHI _1 = a[i_10]; _2 = _1 + 5; a[i_10] = _2; i_7 = i_10 + 1; ivtmp_3 = ivtmp_4 - 1; if (ivtmp_3 != 0) goto ; [87.50%] else goto ; [12.50%] [local count: 835156385]: goto ; [100.00%] and it is vectorized to [local count: 238585440]: # i_10 = PHI # ivtmp_4 = PHI # vectp_a.4_9 = PHI # vectp_a.8_16 = PHI # ivtmp_19 = PHI vect__1.6_13 = MEM [(int *)vectp_a.4_9]; _1 = a[i_10]; vect__2.7_15 = vect__1.6_13 + { 5, 5, 5, 5 }; _2 = _1 + 5; MEM [(int *)vectp_a.8_16] = vect__2.7_15; i_7 = i_10 + 1; ivtmp_3 = ivtmp_4 - 1; vectp_a.4_8 = vectp_a.4_9 + 16; vectp_a.8_17 = vectp_a.8_16 + 16; ivtmp_20 = ivtmp_19 + 1; if (ivtmp_20 < 2) goto ; [50.00%] else goto ; [50.00%] [local count: 119292723]: goto ; [100.00%] This vectorized loop still read _1 from a[i_10] and adds 5 to it, so the second loop iteration will add 5 to the value of a[1]. But the first iteration has already added 5 to a[1], so we are now doing a different calculation compared to the original loop, and this can overflow even if the original did not.
[Bug tree-optimization/106884] ifcombine may move shift so it shifts more than bitwidth
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106884 --- Comment #6 from Krister Walfridsson --- One more similar case (that may be the same as comment #3): int g; void foo(int a, int b, int c, int d, int e) { if ((10 + a) * b) { g = (c || (g >> d)) << 1; } } In this case, reassoc1 optimizes the IR for c || (g >> d) to do (c | (g >> d)) != 0 and we are now always doing the shift, even when c is true.
[Bug tree-optimization/110760] slp introduces new overflow arithmetic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110760 --- Comment #3 from Krister Walfridsson --- (In reply to Andrew Pinski from comment #1) > I thought we decided that vector types don't apply the overflow rules and > always just wrap ... That makes sense. But on the other hand, PR 110495 is a similar issue, and that was fixed... And TYPE_OVERFLOW_WRAPS should return true for integer vectors if they always wrap (or is it only valid for scalars? But ANY_INTEGRAL_TYPE_P is careful to handle vectors and complex numbers too, so I thought the ANY_INTEGRAL_TYPE_CHECK in TYPE_OVERFLOW_WRAPS means that it work for vectors too).
[Bug tree-optimization/110760] New: slp introduces new wrapped arithmetic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110760 Bug ID: 110760 Summary: slp introduces new wrapped arithmetic Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- Consider the following function from gcc.dg/vect/bb-slp-layout-5.c: int a[4], b[4], c[4]; void f1() { a[0] = b[3] - c[3]; a[1] = b[2] + c[2]; a[2] = b[1] - c[1]; a[3] = b[0] + c[0]; } This is vectorized by slp2: vector(4) int vect__1.5; vector(4) int vect__2.8; vector(4) int vect__12.10; vector(4) int vect__3.9; vector(4) int _22; vect__1.5_18 = MEM [(int *)]; vect__2.8_19 = MEM [(int *)]; vect__12.10_21 = vect__1.5_18 + vect__2.8_19; vect__3.9_20 = vect__1.5_18 - vect__2.8_19; _22 = VEC_PERM_EXPR ; MEM [(int *)] = _22; But this introduces new calculations in the temporary vectors of the unused elements: b[0] - c[0]; b[1] + c[1]; b[2] - c[2]; b[3] + c[3]; and these calculations may wrap for input where the original program did not wrap.
[Bug tree-optimization/110554] New: more invalid wide Boolean values
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110554 Bug ID: 110554 Summary: more invalid wide Boolean values Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The fix for PR 110487 improved the situation, but my tool still finds some cases where GCC generates invalid values. One such case can be seen in gcc.c-torture/compile/pr104499.c: typedef int __attribute__((__vector_size__ (8 * sizeof (int V; V v; void foo (void) { v = ((1 | v) != 1); } Here veclower2 is introducing code _8; _10; ... gimple_assign gimple_assign More examples of this failure can be seen in gcc.c-torture/compile/pr108237.c and gcc.c-torture/compile/pr54713-1.c
[Bug tree-optimization/110541] New: Invalid VEC_PERM_EXPR mask element size
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110541 Bug ID: 110541 Summary: Invalid VEC_PERM_EXPR mask element size Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- tree.def says: The number of MASK elements must be the same with the number of elements in V0 and V1. The size of the inner type of the MASK and of the V0 and V1 must be the same. But tree-vectorizer creates permutations where the MASK element size is different than for V0 and V1, such as vector(8) unsigned short _79; ... _79 = VEC_PERM_EXPR <_78, { 0, 0, 0, 0, 0, 0, 0, 0 }, { 4, 5, 6, 7, 8, 9, 10, 11 }>; where the MASK elements are of a 64-bit type. This can be seen when compiling the following function (from gcc.c-torture/compile/2717-1.c) as "gcc -S -O3" for x86_64: short inner_product (short *a, short *b) { int i; short sum = 0; for (i = 9; i >= 0; i--) sum += (*a++) * (*b++); return sum; }
[Bug tree-optimization/110495] New: fre introduces signed wrap for vector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110495 Bug ID: 110495 Summary: fre introduces signed wrap for vector Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The following function (from gcc.dg/tree-ssa/addadd-2.c) typedef int S __attribute__((vector_size(64))); void j(S*x){ *x += __INT_MAX__; *x += __INT_MAX__; } is optimized by fre1 to void j (S * x) { vector(16) int _1; vector(16) int _2; vector(16) int _4; : _1 = *x_6(D); _2 = _1 + { 2147483647, 2147483647, 2147483647, 2147483647, 2147483647, 2147483647, 2147483647, 2147483647, 2147483647, 2147483647, 2147483647, 2147483647, 2147483647, 2147483647, 2147483647, 2147483647 }; *x_6(D) = _2; _4 = _1 + { -2(OVF), -2(OVF), -2(OVF), -2(OVF), -2(OVF), -2(OVF), -2(OVF), -2(OVF), -2(OVF), -2(OVF), -2(OVF), -2(OVF), -2(OVF), -2(OVF), -2(OVF), -2(OVF) }; *x_6(D) = _4; return; } which has signed wrap for the cases where the original did not wrap.
[Bug tree-optimization/110487] New: invalid wide Boolean value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110487 Bug ID: 110487 Summary: invalid wide Boolean value Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The vrp2 pass generates IR where a may get the value 1 (in addition to the valid 0 and -1). This can be seen in gcc.c-torture/compile/pr53410-1.c int *a, b, c, d; void foo (void) { for (; d <= 0; d++) b &= ((a || d) ^ c) == 1; } when compiled as "gcc -O3". The vectorizer has created (correct) code _Bool _16; _66; ... _16 = a.1_1 != 0B; _66 = _16 ? -1 : 0; which then is transformed by vrp2 to _Bool _16; _38; _66; ... _16 = a.1_1 != 0B; _38 = () _16; _66 = -_38; _16 can be both true/false depending on the values of some global variables, so _38 has the value 0 or -1, and _66 has the value 0 or 1.
[Bug tree-optimization/110434] New: tree-nrv introduces incorrect CLOBBER(eol)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110434 Bug ID: 110434 Summary: tree-nrv introduces incorrect CLOBBER(eol) Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The tree-nrv pass may introduce incorrect CLOBBER(eol) of the form ={v} {CLOBBER(eol)}; return ; One example of this can be seen by compiling gcc.c-torture/execute/921204-1.c for x86 using the flags "-O -m32", where it changes the IR union bu o; ... o = i; MEM[(union *)].b18 = _11; MEM[(union *)].b20 = _11; = o; o ={v} {CLOBBER(eol)}; return ; to just use instead of o union bu o [value-expr: ]; ... = i; MEM[(union *)&].b18 = _11; MEM[(union *)&].b20 = _11; ={v} {CLOBBER(eol)}; return ; so the CLOBBER(eol) now refers to .
[Bug tree-optimization/109626] New: forwprop introduces new signed multiplication UB
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109626 Bug ID: 109626 Summary: forwprop introduces new signed multiplication UB Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- Consider the function int foo(_Bool v0, unsigned v1, unsigned v2) { signed int v5 = v1 >> v2; unsigned v6 = -v1; unsigned int v7 = v2 - v0; return (int)v7 * (int)v6; } This does not invoke undefined behavior when called as foo(0, 0x8000, 1), but forwprop1 optimizes this to the equivalent of int foo(_Bool v0, unsigned v1, unsigned v2) { signed int v5 = v1 >> v2; unsigned int v7 = v0 - v2; return (int)v7 * (int)v1; } where the signed multiplication now is calculating -1 * INT_MIN.
[Bug tree-optimization/108625] New: forwprop introduces new UB
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108625 Bug ID: 108625 Summary: forwprop introduces new UB Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- Consider the function unsigned char foo(int x) { int t = -x; unsigned char t1 = t; unsigned char t2 = t; return t1 + t2; } This does not invoke undefined behavior when called as foo(0x4001), but forwprop1 optimizes this to unsigned char foo (int x) { int t; unsigned char _5; int _7; : t_2 = -x_1(D); _7 = t_2 - x_1(D); _5 = (unsigned char) _7; return _5; } where _7 has signed overflow for x = 0x4001.
[Bug tree-optimization/108440] rotate optimization may introduce new UB
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108440 --- Comment #4 from Krister Walfridsson --- I misread the comment -- it describes a possible future improvement (that I believe is not allowed). But the committed patch seems to be correct.
[Bug tree-optimization/106523] [10/11/12 Regression] forwprop miscompile
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106523 --- Comment #8 from Krister Walfridsson --- This fixed most of the rotate issues my translation validation tool found. I assume the remaining issues are due to a different (but similar) bug, so I opened Bug 108440 for those. But the issue in Bug 108440 seems similar to the "Y equal to B case" discussed in comment #6, so I believe the comment is slightly wrong (as the rotate instruction will invoke UB when Y is equal to B).
[Bug tree-optimization/108440] rotate optimization may introduce new UB
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108440 --- Comment #3 from Krister Walfridsson --- Hmm. I think this is the "Y equal to B case" from bug 106523. I.e., the bugfix is not correct...
[Bug tree-optimization/108440] rotate optimization may introduce new UB
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108440 --- Comment #2 from Krister Walfridsson --- No, bug 106523 is a different issue (I have tested with a compiler that has that fixed).
[Bug tree-optimization/108440] New: rotate optimization may introduce new UB
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108440 Bug ID: 108440 Summary: rotate optimization may introduce new UB Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- GCC optimizes shift instructions to rotate in a way that may make the optimized IR invoke UB for cases where the original did not. This can be seen in the IR for f5 from c-c++-common/rotate-1.c: unsigned short int f5 (unsigned short int x, unsigned int y) { return (x << y) | (x >> (__CHAR_BIT__ * __SIZEOF_SHORT__ - y)); } The IR is doing 32-bit shifts, so y = 16 does not invoke UB: short unsigned int f5 (short unsigned int x, unsigned int y) { int _1; int _2; signed short _3; int _4; unsigned int _5; int _6; signed short _7; signed short _8; short unsigned int _11; : _1 = (int) x_9(D); _2 = _1 << y_10(D); _3 = (signed short) _2; _4 = (int) x_9(D); _5 = 16 - y_10(D); _6 = _4 >> _5; _7 = (signed short) _6; _8 = _3 | _7; _11 = (short unsigned int) _8; return _11; } But forwprop1 changes this to a 16-bit rotate which invokes UB for y=16: short unsigned int f5 (short unsigned int x, unsigned int y) { short unsigned int _13; : _13 = x_9(D) r<< y_10(D); return _13; }
[Bug tree-optimization/106884] ifcombine may move shift so it shifts more than bitwidth
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106884 --- Comment #3 from Krister Walfridsson --- A similar case is int r1, r2; int foo(int a, int s1, int s2) { if (a & (1 << s1)) return r1; if (a & (1 << s2)) return r1; return r2; } where reassoc2 optimizes this to always shift by s2.
[Bug tree-optimization/106990] New: Missing TYPE_OVERFLOW_SANITIZED checks in match.pd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106990 Bug ID: 106990 Summary: Missing TYPE_OVERFLOW_SANITIZED checks in match.pd Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- When UBSan is used, match.pd disables simplifications that can remove UB. But two simplifications are missing TYPE_OVERFLOW_SANITIZED checks, making the two tests below fail to report UB when compiled with -fsanitize=undefined. /* (~X - ~Y) -> Y - X. */ int main(void) { volatile int x = -1956816001; volatile int y = 1999200512; return ~x - ~y; } /* -x & 1 -> x & 1. */ int main(void) { volatile int x = 0x8000; return -x & 1; }
[Bug tree-optimization/106884] ifcombine may move shift so it shifts more than bitwidth
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106884 --- Comment #2 from Krister Walfridsson --- This optimization is invalid if (int)1 << 33 is _not_ undefined behavior in GIMPLE! Consider an architecture where (int)1 << 33 evaluates to 0. foo(2, 1, 33) evaluates to 0 for the original GIMPLE, but it evaluates to 2 in the optimized IR.
[Bug sanitizer/106885] New: -(a-b) is folded to b-a before the UBSAN pass is run
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106885 Bug ID: 106885 Summary: -(a-b) is folded to b-a before the UBSAN pass is run Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: sanitizer Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org, jakub at gcc dot gnu.org, kcc at gcc dot gnu.org, marxin at gcc dot gnu.org Target Milestone: --- GCC is folding -(a-b) to b-a before the UBSAN pass is run, which may hide undefined behavior from the sanitizer. This can be seen by the following program, which invokes undefined behavior that is not detected by -fsanitize=undefined int main(void) { volatile int a = 0; volatile int b = 0x8000; return -(a - b); }
[Bug tree-optimization/106884] New: ifcombine may move shift so it shifts more than bitwidth
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106884 Bug ID: 106884 Summary: ifcombine may move shift so it shifts more than bitwidth Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The function foo from gcc.dg/tree-ssa/ssa-ifcombine-1.c can be called as foo(1, 1, 33) without invoking undefined behavior int foo (int x, int a, int b) { int c = 1 << a; if (x & c) if (x & (1 << b)) return 2; return 0; } But ifcombine transforms this to int foo (int x, int a, int b) { int c; int _4; int _10; int _11; int _12; int _13; : _10 = 1 << b_8(D); _11 = 1 << a_5(D); _12 = _10 | _11; _13 = x_7(D) & _12; if (_12 == _13) goto ; else goto ; : : # _4 = PHI <2(3), 0(2)> return _4; } and this will now calculate 1 << 33 unconditionally for _10.
[Bug tree-optimization/106883] New: SLSR may generate signed wrap
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106883 Bug ID: 106883 Summary: SLSR may generate signed wrap Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- SLSR may generate new signed wrap for cases where the original did not wrap. This can be seen in the function f from gcc.dg/tree-ssa/slsr-19.c: int f (int c, int s) { int x1, x2, y1, y2; y1 = c + 2; x1 = s * y1; y2 = y1 + 2; x2 = s * y2; return x1 + x2; } SLSR optimizes this to int f (int c, int s) { int y1; int x2; int x1; int _7; int slsr_9; : y1_2 = c_1(D) + 2; x1_4 = y1_2 * s_3(D); slsr_9 = s_3(D) * 2; x2_6 = x1_4 + slsr_9; _7 = x1_4 + x2_6; return _7; Calling f(-3, 0x75181005) does not make any operation wrap in the original function, but slsr_9 overflow in the optimized code.
[Bug tree-optimization/106744] New: phiopt miscompiles min/max
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106744 Bug ID: 106744 Summary: phiopt miscompiles min/max Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- GCC miscompiles the following test at -O1 or higher optimization levels: #include __attribute__((noinline)) uint8_t three_minmax1 (uint8_t xc, uint8_t xm, uint8_t xy) { uint8_t xk; if (xc > xm) { xk = (uint8_t) (xc < xy ? xc : xy); } else { xk = (uint8_t) (xm < xy ? xm : xy); } return xk; } int main (void) { volatile uint8_t xy = 255; volatile uint8_t xm = 0; volatile uint8_t xc = 255; if (three_minmax1 (xc, xm, xy) != 255) __builtin_abort (); return 0; } What is happening is that phiopt transforms three_minmax1 to _7 = MAX_EXPR ; _9 = MIN_EXPR <_7, xm_3(D)>; return _9; instead of the intended _7 = MAX_EXPR ; _9 = MIN_EXPR <_7, xy_4(D)>; return _9;
[Bug tree-optimization/106523] New: forwprop miscompile
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106523 Bug ID: 106523 Summary: forwprop miscompile Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The function f7 from testsuite/c-c++-common/rotate-2.c is miscompiled by forwprop. This can be seen by running the function as __attribute__((noinline)) unsigned char f7 (unsigned char x, unsigned int y) { unsigned int t = x; return (t << y) | (t >> ((-y) & 7)); } int main (void) { volatile unsigned char x = 152; volatile unsigned int y = 19; if (f7(x, y) != 4) __builtin_abort (); return 0; } This fails at -O1 and higher optimization levels. What is happening here is that forwprop1 has optimized the function to _10 = x_7(D) r<< y_9(D); return _10;
[Bug tree-optimization/106513] bswap is incorrectly generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106513 --- Comment #2 from Krister Walfridsson --- (In reply to Andreas Schwab from comment #1) > This subexpression has undefined behaviour: (((int64_t) 0xff) << 56). I thought that was allowed in GCC as the manual says (https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc/Integers-implementation.html#Integers-implementation) "As an extension to the C language, GCC does not use the latitude given in C99 and C11 only to treat certain aspects of signed ‘<<’ as undefined." If not, what behavior does the manual refer to?
[Bug tree-optimization/106513] New: bswap is incorrectly generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106513 Bug ID: 106513 Summary: bswap is incorrectly generated Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- GCC may incorrectly generate bswap instructions for code not doing a correct swap. This can be seen by running the function from testsuite/gcc.dg/pr40501.c as typedef long int int64_t; __attribute__((noinline)) int64_t swap64 (int64_t n) { return (((n & (((int64_t) 0xff) )) << 56) | ((n & (((int64_t) 0xff) << 8)) << 40) | ((n & (((int64_t) 0xff) << 16)) << 24) | ((n & (((int64_t) 0xff) << 24)) << 8) | ((n & (((int64_t) 0xff) << 32)) >> 8) | ((n & (((int64_t) 0xff) << 40)) >> 24) | ((n & (((int64_t) 0xff) << 48)) >> 40) | ((n & (((int64_t) 0xff) << 56)) >> 56)); } int main (void) { volatile int64_t n = 0x8000l; if (swap64(n) != 0xff80l) __builtin_abort (); return 0; } This fails at -Os and higher optimization levels.
[Bug tree-optimization/85762] New: [8/9 Regression] range-v3 abstraction overhead not optimized away
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85762 Bug ID: 85762 Summary: [8/9 Regression] range-v3 abstraction overhead not optimized away Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- Created attachment 44124 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44124=edit preprocessed source code for run_range() GCC 8 is less aggressive than earlier versions when eliminating abstraction overhead in the range-v3 library, which can be seen with the function #include #include long run_range(std::vector const , long to_find) { auto const found_index = ranges::distance(lengths | ranges::view::transform(ranges::convert_to{}) | ranges::view::partial_sum() | ranges::view::take_while([=](auto const i) { return !(to_find < i); })); return found_index; } GCC 7 compiled the loop to [10.87%]: # it$_M_current_41 = PHI <_6(4), _27(8)> # it$16_26 = PHI <it$16_24(4), _31(8)> _53 = to_find_2(D) < it$16_26; [100.00%]: # it$_M_current_23 = PHI <it$_M_current_41(5), _27(7)> _20 = _7 == it$_M_current_23; _5 = _20 | _53; if (_5 != 0) goto ; [7.36%] else goto ; [92.64%] [92.60%]: _27 = it$_M_current_23 + 4; if (_7 != _27) goto ; [3.75%] else goto ; [96.25%] [3.47%]: _29 = MEM[(const int &)it$_M_current_23 + 4]; _30 = (long int) _29; _31 = it$16_26 + _30; goto ; [100.00%] [7.36%]: _33 = (long int) it$_M_current_23; _34 = (long int) _6; _35 = _33 - _34; _36 = _35 /[ex] 4; return _36; while the loop compiled by GCC 8 updates some structures in each iteration [local count: 1478210893]: # it_47 = PHI <SR.352_183(4), _64(8)> # it$16$sum__115 = PHI <SR.353_184(4), _67(8)> _42 = to_find_2(D) < it$16$sum__115; [local count: 1651554780]: # it_30 = PHI <it_47(5), _64(7)> _46 = it_30 == SR.355_137; _40 = _42 | _46; if (_40 != 0) goto ; [65.00%] else goto ; [35.00%] [local count: 577812955]: SR.80_62 = MEM[(const struct __normal_iterator &)SR.354_185 + 24]; MEM[(struct adaptor_cursor *)] = SR.80_62; MEM[(struct box *)].value = pos; SR.396_209 = MEM[(struct adaptor_cursor *)]; _64 = it_30 + 4; if (_64 != SR.396_209) goto ; [70.00%] else goto ; [30.00%] [local count: 404469068]: _65 = MEM[(const int &)it_30 + 4]; _66 = (long int) _65; _67 = _66 + it$16$sum__115; goto ; [100.00%] [local count: 1073279389]: _32 = it_30 - SR.352_183; _33 = _32 /[ex] 4; D.357125 ={v} {CLOBBER}; D.311383 ={v} {CLOBBER}; return _33; which makes this loop about 10x slower on my computer. GCC 8 also generates lots of code setting up the function that GCC 7 manages to eliminate. This regression was introduced by r255510: 2017-12-08 Martin Jambor <mjam...@suse.cz> PR tree-optimization/83141 * tree-sra.c (contains_vce_or_bfcref_p): Move up in the file, also test for MEM_REFs implicitely changing types with padding. Remove inline keyword. (build_accesses_from_assign): Added contains_vce_or_bfcref_p checks. To reproduce the problem, compile the attached file as g++ -O2 -S ranges.ii and notice the difference in the generated code.
[Bug rtl-optimization/85594] New: ICE during expand when compiling with -fwrapv -fopenmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85594 Bug ID: 85594 Summary: ICE during expand when compiling with -fwrapv -fopenmp Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- Compiling gcc/testsuite/gcc.dg/gomp/pr81768-2.c with "-fwrapv -fopenmp" fails with an ICE: > gcc -S -fwrapv -fopenmp pr81768-2.c during RTL pass: expand ../pr81768-2.c: In function 'foo._omp_fn.1': ../pr81768-2.c:10:9: internal compiler error: in make_decl_rtl, at varasm.c:1322 #pragma omp target parallel for schedule(static, 32) collapse(3) ^~~ 0x5d230c make_decl_rtl(tree_node*) ../../gcc/gcc/varasm.c:1318 0x7c79bc expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) ../../gcc/gcc/expr.c:9965 0x7d05de expand_expr ../../gcc/gcc/expr.h:280 0x7d05de expand_expr_addr_expr_1 ../../gcc/gcc/expr.c:7946 0x7d0465 expand_expr_addr_expr_1 ../../gcc/gcc/expr.c:7992 0x7c698d expand_expr_addr_expr ../../gcc/gcc/expr.c:8067 0x7c698d expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) ../../gcc/gcc/expr.c:11239 0x7433a0 expand_normal ../../gcc/gcc/expr.h:286 0x7433a0 do_compare_and_jump ../../gcc/gcc/dojump.c:1196 0x744253 do_jump_1(tree_code, tree_node*, tree_node*, rtx_code_label*, rtx_code_label*, profile_probability) ../../gcc/gcc/dojump.c:261 0x6dc3cc expand_gimple_cond ../../gcc/gcc/cfgexpand.c:2495 0x6dc3cc expand_gimple_basic_block ../../gcc/gcc/cfgexpand.c:5674 0x6dff66 execute ../../gcc/gcc/cfgexpand.c:6425 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions.
[Bug tree-optimization/85588] New: -fwrapv miscompilation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85588 Bug ID: 85588 Summary: -fwrapv miscompilation Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- GCC miscompiles gcc/testsuite/gcc.dg/torture/pr57656.c when using -fwrapv > gcc -fwrapv pr57656.c > ./a.out Abort (core dumped) The problem seems to be exactly the same as in PR57656 (but when using -fwrapv): t = 1 - ((a - b) / c); is changed to t = (b - a) / c + 1; which is not the same in this case where both (a - b) and (b - a) have the value 0x8000. This fails in GCC 6 and newer versions. Compiling using GCC 5 produces the correct result.
[Bug c/82296] Warn for code removal due to "code never accesses array out of bounds" assumption
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82296 Krister Walfridsson changed: What|Removed |Added CC||kristerw at gcc dot gnu.org --- Comment #7 from Krister Walfridsson --- The C89 rules are the same as for C11 -- you can find the relevant text in C90 6.3.6 (it does not cover the "UB 62" from the ARR30-C page, but that is because C89 does not have flexible array members...) Using -std=c89 will compile following the rules in C89, so you will not suffer from new undefined behaviors introduced in newer standards.
[Bug target/77480] netbsd specfile will not link against libc when building -shared (+patch)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77480 Krister Walfridsson changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #4 from Krister Walfridsson --- Fixed for trunk and GCC 7.3. Closing this bug as I'm not planning to backport to GCC 6.
[Bug target/77480] netbsd specfile will not link against libc when building -shared (+patch)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77480 --- Comment #3 from Krister Walfridsson --- Author: kristerw Date: Fri Sep 29 21:34:00 2017 New Revision: 253309 URL: https://gcc.gnu.org/viewcvs?rev=253309=gcc=rev Log: 2017-09-29 Krister WalfridssonBackport from mainline 2017-06-29 Maya Rashish PR target/77480 * config/netbsd.h (NETBSD_LIB_SPEC): Add -lc when creating shared objects. Modified: branches/gcc-7-branch/gcc/ChangeLog branches/gcc-7-branch/gcc/config/netbsd.h
[Bug target/39570] cabs and cabsf are named differently on NetBSD 5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39570 --- Comment #14 from Krister Walfridsson --- Author: kristerw Date: Fri Sep 29 09:38:08 2017 New Revision: 253283 URL: https://gcc.gnu.org/viewcvs?rev=253283=gcc=rev Log: 2017-09-29 Krister WalfridssonBackport from mainline 2017-09-26 Krister Walfridsson PR target/39570 * gcc/config/netbsd-protos.h: New file. * gcc/config/netbsd.c: New file. * gcc/config/netbsd.h (SUBTARGET_INIT_BUILTINS): Define. * gcc/config/t-netbsd: New file. * gcc/config.gcc (tm_p_file): Add netbsd-protos.h. (tmake_file) Add t-netbsd. (extra_objs) Add netbsd.o. Added: branches/gcc-7-branch/gcc/config/netbsd-protos.h branches/gcc-7-branch/gcc/config/netbsd.c branches/gcc-7-branch/gcc/config/t-netbsd Modified: branches/gcc-7-branch/gcc/ChangeLog branches/gcc-7-branch/gcc/config.gcc branches/gcc-7-branch/gcc/config/netbsd.h
[Bug target/77480] netbsd specfile will not link against libc when building -shared (+patch)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77480 Krister Walfridsson changed: What|Removed |Added CC||kristerw at gcc dot gnu.org --- Comment #2 from Krister Walfridsson --- Fixed on trunk by r249822
[Bug target/80600] hidden symbol `__cpu_model' is referenced by DSO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80600 Krister Walfridsson changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |FIXED --- Comment #13 from Krister Walfridsson --- Fixed for trunk and GCC 7.3 (GCC 6 and 5 does not have this problem).
[Bug target/80600] hidden symbol `__cpu_model' is referenced by DSO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80600 --- Comment #12 from Krister Walfridsson --- Author: kristerw Date: Thu Sep 28 19:17:51 2017 New Revision: 253263 URL: https://gcc.gnu.org/viewcvs?rev=253263=gcc=rev Log: gcc/ChangeLog: Backport from mainline 2017-05-14 Krister WalfridssonPR target/80600 * config/netbsd.h (NETBSD_LIBGCC_SPEC): Always add -lgcc. libgcc/ChangeLog: Backport from mainline 2017-05-14 Krister Walfridsson PR target/80600 * config.host (*-*-netbsd*): Add t-slibgcc-libgcc to tmake_file. Modified: branches/gcc-7-branch/gcc/ChangeLog branches/gcc-7-branch/gcc/config/netbsd.h branches/gcc-7-branch/libgcc/ChangeLog branches/gcc-7-branch/libgcc/config.host
[Bug target/39570] cabs and cabsf are named differently on NetBSD 5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39570 --- Comment #13 from Krister Walfridsson --- Author: kristerw Date: Tue Sep 26 21:26:21 2017 New Revision: 253216 URL: https://gcc.gnu.org/viewcvs?rev=253216=gcc=rev Log: 2017-09-26 Krister WalfridssonPR target/39570 * gcc/config/netbsd-protos.h: New file. * gcc/config/netbsd.c: New file. * gcc/config/netbsd.h (SUBTARGET_INIT_BUILTINS): Define. * gcc/config/t-netbsd: New file. * gcc/config.gcc (tm_p_file): Add netbsd-protos.h. (tmake_file) Add t-netbsd. (extra_objs) Add netbsd.o. Added: trunk/gcc/config/netbsd-protos.h trunk/gcc/config/netbsd.c trunk/gcc/config/t-netbsd Modified: trunk/gcc/ChangeLog trunk/gcc/config.gcc trunk/gcc/config/netbsd.h
[Bug middle-end/82177] Alias analysis too aggressive with integer-to-pointer cast
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82177 Krister Walfridsson changed: What|Removed |Added CC||kristerw at gcc dot gnu.org --- Comment #5 from Krister Walfridsson --- Did you mean PR61502 - "== comparison on "one-past" pointer gives wrong result"?
[Bug tree-optimization/81554] New: [8 Regression] 25% performance regression in Himeno benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81554 Bug ID: 81554 Summary: [8 Regression] 25% performance regression in Himeno benchmark Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- Created attachment 41831 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41831=edit The Himeno benchmark The Himeno benchmark from the Phoronix test suite lost 25% of its performance by r248771 that fixed PR 66313 ("Unsafe factorization of a*b+a*c"). The benchmark is attached, can be compiled as gcc -O3 himenobmtxpa.c and run as ./a.out s I see 15% slowdown when the benchmark is compiled as "-O3" and 25% if compiled as "-O3 -march=native" on a Broadwell CPU.
[Bug tree-optimization/81409] New: Inefficient loops generated from range-v3 code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81409 Bug ID: 81409 Summary: Inefficient loops generated from range-v3 code Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- Created attachment 41728 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41728=edit Preprocessed file for run_range() The range-v3 (https://github.com/ericniebler/range-v3) function long run_range(std::vector const , long to_find) { auto const found_index = ranges::distance(lengths | ranges::view::transform(ranges::convert_to{}) | ranges::view::partial_sum() | ranges::view::take_while([=](auto const i) { return !(to_find < i); })); return found_index; } is generated as slow code with GCC, needing 3x the time to run compared to the code generated by LLVM (when compiled with "-O3 -std=c++14 -DNDEBUG"). The calculation done in run_range() is the equivalent of long run_forloop(std::vector const , long to_find) { long len = vec.end() - vec.begin(); const int *p = [0]; long i, acc = 0; for (i = 0; i < len; i++) { acc += p[i]; if (to_find < acc) break; } return i; } and LLVM manages to generate similar code for both functions, while GCC seems to be confused by the run_range() loop and generates extra comparisions and a somewhat messy code flow...
[Bug target/80600] hidden symbol `__cpu_model' is referenced by DSO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80600 --- Comment #11 from Krister Walfridsson --- Author: kristerw Date: Sun May 14 22:49:03 2017 New Revision: 248037 URL: https://gcc.gnu.org/viewcvs?rev=248037=gcc=rev Log: PR target/80600 - hidden symbol '__cpu_model' is referenced by DSO gcc/ChangeLog: PR target/80600 * config/netbsd.h (NETBSD_LIBGCC_SPEC): Always add -lgcc. libgcc/ChangeLog: PR target/80600 * config.host (*-*-netbsd*): Add t-slibgcc-libgcc to tmake_file. Modified: trunk/gcc/ChangeLog trunk/gcc/config/netbsd.h trunk/libgcc/ChangeLog trunk/libgcc/config.host