from:"kristerw at gcc dot gnu.org"

[Bug tree-optimization/116120] New: Wrong code for (a ? x : y) != (b ? x : y)

2024-07-27 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116120

Bug ID: 116120
   Summary: Wrong code for (a ? x : y) != (b ? x : y)
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

GCC is miscompiling the functions in g++.dg/tree-ssa/pr50.C, such as:

typedef int v4si __attribute((__vector_size__(4 * sizeof(int;
v4si f1_(v4si a, v4si b, v4si c, v4si d, v4si e, v4si f) {
  v4si X = a == b ? e : f;
  v4si Y = c == d ? e : f;
  return (X != Y);
}

The reason is that PR50 implemented match patterns of the form:

  (a ? x : y) != (b ? x : y) --> (a^b) ? TRUE  : FALSE

But this optimization is not correct -- the optimized code gives us a different
result for:
  a = TRUE
  b = FALSE
  x = 0
  y = 0

[Bug tree-optimization/114090] New: forwprop -fwrapv miscompilation

2024-02-24 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114090

Bug ID: 114090
   Summary: forwprop -fwrapv miscompilation
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

The function f below returns an incorrect result for INT_MIN when compiled with
-O1 -fwrapv for X86_64:


__attribute__((noipa)) int f(int x) {
int w = (x >= 0 ? x : 0);
int y = -x;
int z = (y >= 0 ? y : 0);
return w + z;
}

int
main ()
{
  if (f(0x8000) != 0)
__builtin_abort ();
  return 0;
}


What is happening is that forwprop has optimized

  w_2 = MAX_EXPR ;
  y_3 = -x_1(D);
  z_4 = MAX_EXPR ;
  _5 = w_2 + z_4;
  return _5;

to

  _5 = ABS_EXPR ;
  return _5;

[Bug tree-optimization/114056] New: ifcvt may introduce use of uninitialized variables

2024-02-22 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114056

Bug ID: 114056
   Summary: ifcvt may introduce use of uninitialized variables
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

The ifcvt pass may make the code more UB by doing operations on uninitialized
variables, which can be seen by compiling the following (from
gcc.c-torture/compile/pr80422.c) with -O2 for X86_64:


int a, c, f;
short b, d, e;

int fn1 (int h)
{ 
  return a > 2 || h > a ? h : h << a;
}

void fn2 ()
{ 
  int j, k;
  while (1)
{ 
  k = c && b;
  f &= e > (fn1 (k) && j);
  if (!d)
break;
}
}


What is happening here is that .LOOP_VECTORIZED (1, 2) != 0 branches to bb 16
with _17 uninitialized, which is then used in some calculations:

  _34 = .LOOP_VECTORIZED (2, 3);
  if (_34 != 0)
goto ; [100.00%]
  else
goto ; [100.00%]

   [local count: 77953654]:

   [local count: 708669600]:
  # _13 = PHI <_24(27), _17(D)(45)>
  _18 = _13 <= 0;
  _14 = _9 & _18;
  _27 = _13 > 0;
  _28 = _9 & _27;
  _29 = _13 < -29020049;
  _30 = ~_29;
  _31 = _14 & _30;
  _12 = _15 ? _3 : _13;
  _42 = (unsigned int) _12;
  _43 = _42 * 4294967222;
  _32 = _15 | _28;
  _33 = _31 | _32;
  _23 = _33 ? _43 : 4294967222;
  _24 = _33 ? _12 : _13;
  if (x_6(D) > _23)
goto ; [11.00%]
  else
goto ; [89.00%]

This does not affect the result, but the discussion about the semantics of
uninitialized variables on the mailing list a while back concluded that
operations on uninitialized data is UB (with a few exceptions related to moving
data...).

[Bug tree-optimization/114032] New: ifcvt may introduce UB calls to __builtin_clz(0)

2024-02-21 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114032

Bug ID: 114032
   Summary: ifcvt may introduce UB calls to __builtin_clz(0)
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

The ifcvt pass may make the code more UB, which can be seen by compiling the
following function with -O3 for X86_64:


int a, b, i;
int scaleValueSaturate(int value) {
  if (value) {
int result = __builtin_clz(value);
if (-result <= a)
  return 0;
  }
  return b;
}
short dst;
short *src;
void scaleValuesSaturate() {
  for (; i; i++)
dst = scaleValueSaturate(src[i]);
}


What is happening here is that the code for .LOOP_VECTORIZED (1, 2) != 0 always
calls __builtin_clz, even when value is 0:

   [local count: 955630224]:
  # i.5_21 = PHI <_7(9), i.5_20(24)>
  _2 = (long unsigned int) i.5_21;
  _3 = _2 * 2;
  _4 = src.2_1 + _3;
  _5 = *_4;
  value.0_11 = (unsigned int) _5;
  result_14 = __builtin_clz (value.0_11);
  _47 = (unsigned int) result_14;
  _48 = -_47;
  _15 = (int) _48;
  _23 = _5 != 0;
  _28 = _15 <= a.1_16;
  _46 = _23 & _28;
  prephitmp_31 = _46 ? 0 : _30;
  dst = prephitmp_31;
  _7 = i.5_21 + 1;
  i = _7;
  if (_7 != 0)
goto ; [89.00%]
  else
goto ; [11.00%]

[Bug tree-optimization/113703] ivopts miscompiles loop

2024-02-01 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113703

--- Comment #3 from Krister Walfridsson  ---
Oops. I messed up the test case...  It "works", but the actual values does not
make sense...

The following is better:

int main()
{
  long pgsz = sysconf (_SC_PAGESIZE);
  void *p = mmap (NULL, pgsz * 2, PROT_READ|PROT_WRITE,
 MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
  if (p == MAP_FAILED)
return 0;
  mprotect (p+pgsz, pgsz, PROT_NONE);
  uintptr_t n = -2 - (uintptr_t)(p+pgsz);
  f1 (p+pgsz, -2, n);
  return 0;
}

[Bug tree-optimization/113703] ivopts miscompiles loop

2024-02-01 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113703

--- Comment #2 from Krister Walfridsson  ---
Here is a runtime testcase:

#include 
#include 
#include 

__attribute__((noipa))
void f1 (char *p, uintptr_t i, uintptr_t n)
{
  p += i;
  do
{
  *p = '\0';
  p += 1;
  i++;
}
  while (i < n);
}

int main()
{
  long pgsz = sysconf (_SC_PAGESIZE);
  void *p = mmap (NULL, pgsz * 2, PROT_READ|PROT_WRITE,
 MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
  if (p == MAP_FAILED)
return 0;
  mprotect (p+pgsz, pgsz, PROT_NONE);
  uintptr_t n = -3 - (uintptr_t)p;
  f1 (p+2, -2, n);
  return 0;
}

[Bug tree-optimization/113703] New: ivopts miscompiles loop

2024-02-01 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113703

Bug ID: 113703
   Summary: ivopts miscompiles loop
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

The following function (gcc.dg/tree-ssa/ivopts-lt.c) is miscompiled when
compiled with with -O1 for X86_64:

#include "stdint.h"

void
f1 (char *p, uintptr_t i, uintptr_t n)
{
  p += i;
  do
{
  *p = '\0';
  p += 1;
  i++;
}
  while (i < n);
}


The IR after cunroll looks like:

void f1 (char * p, uintptr_t i, uintptr_t n)
{
  :
  p_6 = p_4(D) + i_5(D);

  :
  # p_1 = PHI 
  # i_2 = PHI 
  *p_1 = 0;
  p_9 = p_1 + 1;
  i_10 = i_2 + 1;
  if (i_10 < n_11(D))
goto ;
  else
goto ;

  :
  goto ;

  :
  return;
}


This is then changed by ivopts to

void f1 (char * p, uintptr_t i, uintptr_t n)
{
  sizetype _13;
  char * _14;

  :
  p_6 = p_4(D) + i_5(D);
  _13 = n_11(D) - i_5(D);
  _14 = p_6 + _13;

  :
  # p_1 = PHI 
  MEM[(char *)p_1] = 0;
  p_9 = p_1 + 1;
  if (p_9 < _14)
goto ;
  else
goto ;

  :
  goto ;

  :
  return;
}


Suppose the function gets called with the values:

  p = 0x0002
  i = 0x0001
  n = 0xdffd7fff

The original function writes 0 to address 0x0002, and then exits.

The optimized function overflows when calculating _14, and the function does
the equivalent of
  memset(0x0002, 0, 0xdffe7ffe);

[Bug tree-optimization/113630] New: -fno-strict-aliasing introduces out-of-bounds memory access

2024-01-27 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113630

Bug ID: 113630
   Summary: -fno-strict-aliasing introduces out-of-bounds memory
access
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

The test gcc.dg/torture/pr110799.c crashes because of an out of bounds memory
access when compiled with "-O2 -fno-strict-aliasing".

What is happening is that the pre pass has changed

struct S {
int a;
};
struct M {
int a, b;
};

__attribute__((noipa, noinline, noclone, no_icf))
int f (struct S * p, int c, int d)
{
  int r;

  :
  if (c_2(D) != 0)
goto ;
  else
goto ;

  :
  if (d_6(D) != 0)
goto ;
  else
goto ;

  
  r_8 = p_4(D)->a;
  goto ;

  
  r_7 = MEM[(struct M *)p_4(D)].a;
  goto ;

  
  r_5 = MEM[(struct M *)p_4(D)].b;

  
  # r_1 = PHI 
  return r_1;
}


by combining  bb 4 and bb 5 and doing all accesses as struct M:


__attribute__((noipa, noinline, noclone, no_icf))
int f (struct S * p, int c, int d)
{
  int r;
  int pretmp_9;

  :
  if (c_2(D) != 0)
goto ; [50.00%]
  else
goto ; [50.00%]

  :
  pretmp_9 = MEM[(struct M *)p_4(D)].a;
  goto ;

  :
  r_5 = MEM[(struct M *)p_4(D)].b;

  :
  # r_1 = PHI 
  return r_1;
}


This in turn allows later passes to hoist the two loads


__attribute__((noipa, noinline, noclone, no_icf))
int f (struct S * p, int c, int d)
{
  int r;
  int pretmp_9;

  :
  pretmp_9 = MEM[(struct M *)p_4(D)].a;
  r_5 = MEM[(struct M *)p_4(D)].b;
  if (c_2(D) != 0)
goto ;
  else
goto ;

  :

  :
  # r_1 = PHI 
  return r_1;
}


which now reads out of bounds when we pass a struct S as f(, 1, 1).

[Bug tree-optimization/113590] New: The vectorizer introduces signed overflow

2024-01-24 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113590

Bug ID: 113590
   Summary: The vectorizer introduces signed overflow
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

The vectorizer introduces new signed overflow in the function below when
compiled with -O3 for x86_64:


__attribute__ ((noinline)) int
liveloop (int start, int n, int *x, int *y)
{
  int i = start;
  int j;
  int ret;

  for (j = 0; j < n; ++j)
{
  i += 1;
  x[j] = i;
  ret = y[j];
}
  return ret;
}


The vectorized loop looks like:

   [local count: 860067200]:
  # vect_vec_iv_.9_57 = PHI <_58(6), _55(9)>
  # vectp_x.11_61 = PHI 
  # ivtmp_64 = PHI 
  _58 = vect_vec_iv_.9_57 + { 4, 4, 4, 4 };
  vect_i_13.10_60 = vect_vec_iv_.9_57 + { 1, 1, 1, 1 };
  MEM  [(int *)vectp_x.11_61] = vect_i_13.10_60;
  vectp_x.11_62 = vectp_x.11_61 + 16;
  ivtmp_65 = ivtmp_64 + 1;
  if (ivtmp_65 < bnd.5_47)
goto ; [89.00%]
  else
goto ; [11.00%]

  [local count: 765459809]:
  goto ; [100.00%]

The problem arises from _58, which may overflow in the last iteration. For
example, if the function is called as
  liveloop(0x7ff1, 12, p, q);

[Bug tree-optimization/113588] New: The vectorizer is introducing out-of-bounds memory access

2024-01-24 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113588

Bug ID: 113588
   Summary: The vectorizer is introducing out-of-bounds memory
access
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

The following function is miscompiled for x86_64 when compiled with
-O3 -march=x86-64-v2


unsigned long
foo (const char *s, unsigned long n)
{
 unsigned long len = 0;
 while (*s++ && n--)
   ++len;
 return len;
}


The original function reads two bytes from 's' when called as:

 char a[4];
 a[0] = 1;
 a[1] = 0;
 foo(a, 1000);

However, the vectorized function reads 16 bytes (thereby accessing the buffer
out of bounds) as it reads one vector at a time when s[0] != 0 and n >= 16.

[Bug tree-optimization/113424] lim fails to notice possible aliasing

2024-01-16 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113424

Krister Walfridsson  changed:

   What|Removed |Added

 Resolution|INVALID |FIXED

--- Comment #4 from Krister Walfridsson  ---
That makes sense. And it means the check for local variables I have implemented
in smtgcc need some improvements...

Anyway, to answer the question from comment 2 (which I guess is irrelevant
now): the code is a slightly modified g++.dg/opt/pr80436.C which smtgcc claimed
was miscompiled because of this issue.

[Bug tree-optimization/113424] New: lim fails to notice possible aliasing

2024-01-16 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113424

Bug ID: 113424
   Summary: lim fails to notice possible aliasing
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

The lim pass miscompiles the following C++ program when compiled as -O3 for
x86_64 (note: it works as intended when compiled as a C program)

struct { char elt1; char bits; } *a;
char
bar (char *x, char b)
{
  if (0)
  next_bit:
return 1;
  while (1)
{
  if (b)
if (a->bits)
  goto next_bit;
  *x = b;
  if (a->elt1)
return 0;
  a = 0;
}
}

The loop lim gets as input looks as following

  
  if (b_9(D) != 0)
goto ;
  else
goto ;

  
  a.0_1 = a;
  _2 = a.0_1->bits;
  if (_2 != 0)
goto ;
  else
goto ;

  
  *x_10(D) = b_9(D);
  a.1_3 = a;
  _4 = a.1_3->elt1;
  if (_4 != 0)
goto ; [5.50%]
  else
goto ; [94.50%]

  
  a = 0B;
  goto ; [100.00%]

The lim pass changes this to load `a` before the loop and uses the same value
of `a` for both accesses in bb4 and bb5, which is not correct as the store
`*x_10(D)` may have modified `a` before the access in bb5.

[Bug tree-optimization/112949] evrp produces incorrect range for __builtin_clz

2023-12-10 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112949

--- Comment #3 from Krister Walfridsson  ---
The C program is obviously UB. But the optimization is done on GIMPLE, and it
is not obvious to me that the GIMPLE code is UB -- we have a function called
__builtin_clz that calls an internal function, so they are different...

[Bug tree-optimization/112949] New: evrp produces incorrect range for __builtin_clz

2023-12-10 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112949

Bug ID: 112949
   Summary: evrp produces incorrect range for __builtin_clz
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

The evrp pass generates incorrect ranges for __builtin_clz when it is called
within a function named __builtin_clz. While calling it in this manner seems
questionable, two relatively recent tests in the testsuite (gcc.dg/pr100521.c
and gcc.dg/pr100790.c) suggest that gcc should handle this.

The test case gcc.dg/pr100790.c is as follows:

  __builtin_clz(int x) { x ? __builtin_clz(x) : 32; }

Compiling this for x86_64 using -O3 -fpermissive results in the evrp IR:

  Global Exported: iftmp.0_3 = [irange] int [1, 31]
  __attribute__((nothrow, leaf, const))
  int __builtin_clz (int x)
  {
int iftmp.0_3;

 :
if (x_1(D) != 0)
  goto ; [INV]
else
  goto ; [INV]

 :
iftmp.0_3 = __builtin_clz (x_1(D));

 :
return;

  }

The range for iftmp.0_3 (which is an internal call to CFN_BUILT_IN_CLZ) should
be [0, 31], not [1, 31].

[Bug tree-optimization/111668] [12/13 Regression] vrp2 (match and simplify) introduces invalid wide signed Boolean values

2023-11-27 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111668

--- Comment #9 from Krister Walfridsson  ---
I opened PR 112738 for the issue mentioned in comment 8.

[Bug tree-optimization/112738] New: forwprop4 introduces invalid wide signed Boolean values

2023-11-27 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112738

Bug ID: 112738
   Summary: forwprop4 introduces invalid wide signed Boolean
values
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

The forwprop4 pass introduces an invalid wide Boolean when compiling the
following function with -O3 for X86_64:

  int *a, b, c, d;
  void
  foo (void)
  {
for (; d <= 0; d++)
  b &= ((a || d) ^ c) == 1;
  }

What is happening is that forwprop4 changes the IR

  _38 = (signed int) _16;
  _59 = -_38;
  _65 = () _59;

to the incorrect

  _55 = () _16;
  _65 = -_55;

[Bug tree-optimization/112736] New: vectorizer is introducing out of bounds memory access

2023-11-27 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112736

Bug ID: 112736
   Summary: vectorizer is introducing out of bounds memory access
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

The following function (from gcc.dg/torture/pr68379.c)

  int a, b[3], c[3][5];

  void
  fn1 ()
  {
int e;
for (a = 2; a >= 0; a--)
  for (e = 0; e < 4; e++)
c[a][e] = b[a];
  }

generates out of bound memory access (where the three movdqu instructions read
1, 2, and 3 elements before b) when compiled as -O3 for x86_64:

  fn1:
movdqu  b-4(%rip), %xmm1
movdqu  b-8(%rip), %xmm2
movl$-1, a(%rip)
movdqu  b-12(%rip), %xmm3
pshufd  $255, %xmm1, %xmm0
movups  %xmm0, c+40(%rip)
pshufd  $255, %xmm2, %xmm0
movups  %xmm0, c+20(%rip)
pshufd  $255, %xmm3, %xmm0
movaps  %xmm0, c(%rip)
ret

The vector operations were introduced by the "vect" pass.

[Bug tree-optimization/111668] [12/13 Regression] vrp2 (match and simplify) introduces invalid wide signed Boolean values

2023-10-08 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111668

--- Comment #8 from Krister Walfridsson  ---
I still see negation of a wide signed Boolean in the IR for this function. But
now it is forwprop4 that changes

  _38 = (signed int) _16;
  _43 = -_38;
  _66 = () _43;

to

  _56 = () _16;
  _66 = -_56;

[Bug tree-optimization/111668] New: vrp2 introduces invalid wide Boolean values

2023-10-02 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111668

Bug ID: 111668
   Summary: vrp2 introduces invalid wide Boolean values
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

The vrp2 pass introduces an invalid wide Boolean when compiling the function

  int *a, b, c, d;
  void
  foo (void)
  {
for (; d <= 0; d++)
  b &= ((a || d) ^ c) == 1;
  }

What is happening is that vrp2 changes the IR

  _Bool _16;
   _66;

  gimple_assign 

to the incorrect

  _Bool _16;
   _38;
   _66;

  gimple_assign 
  gimple_assign

[Bug analyzer/104940] RFE: integrate analyzer with an SMT solver

2023-09-30 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104940

Krister Walfridsson  changed:

   What|Removed |Added

 CC||kristerw at gcc dot gnu.org

--- Comment #7 from Krister Walfridsson  ---
I have released a new version of my tool doing GIMPLE IR to SMT conversion.
This is now written in C++, and converts a bigger subset of GIMPLE. The code is
available at https://github.com/kristerw/smtgcc

[Bug tree-optimization/111494] New: Signed overflow introduced by vectorizer

2023-09-20 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111494

Bug ID: 111494
   Summary: Signed overflow introduced by vectorizer
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

The vectorizer changes the order of additions when vectorizing the loop below,
but it is not changing the arithmetic to be unsigned, so it introduces new
signed overflows that were not in the original program.

  int a[32];
  int foo(int n) {
int sum = 0;
for (int i = 0; i < n; i++)
  sum += a[i];
return sum;
  }

[Bug tree-optimization/111280] New: CLZ(0) generated when CLZ_DEFINED_VALUE_AT_ZERO is false

2023-09-03 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111280

Bug ID: 111280
   Summary: CLZ(0) generated when CLZ_DEFINED_VALUE_AT_ZERO is
false
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

GCC may generate an internal call to CLZ with 0 when CLZ_DEFINED_VALUE_AT_ZERO
is false, which can be seen with gcc.c-torture/execute/920501-6.c where sccp
changes a loop to

  _36 = t_10(D) != 0;
  _35 = .CLZ (t_10(D));
  _34 = 63 - _35;
  _33 = (unsigned int) _34;
  _32 = (long long unsigned int) _33;
  _31 = _32 + 1;
  b_38 = _36 ? _31 : 1;

The value _35 is not used when t_10(D) is 0, so it may be reasonable to allow
this. But the value _35 may then be any value, so _34 may overflow. I.e., the
calculation
  _34 = 63 - _35;
must be changed to be done unsigned.

And the ranges calculated during the dom3 pass claims that _35 has a range
  _35  : [irange] int [0, 63]
which also is wrong.

[Bug tree-optimization/111257] New: new signed overflow after vectorizer

2023-08-31 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111257

Bug ID: 111257
   Summary: new signed overflow after vectorizer
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

The vectorizer is not removing the original scalar calculations, and they may
overflow after vectorization.

This can be seen with

  int a[8];

  void foo(void)
  {
for (int i = 0; i < 8; i++)
  a[i] = a[i] + 5;
  }

The IR for the loop before vectorization looks like

   [local count: 954449104]:
  # i_10 = PHI 
  # ivtmp_4 = PHI 
  _1 = a[i_10];
  _2 = _1 + 5;
  a[i_10] = _2;
  i_7 = i_10 + 1;
  ivtmp_3 = ivtmp_4 - 1;
  if (ivtmp_3 != 0)
goto ; [87.50%]
  else
goto ; [12.50%]

   [local count: 835156385]:
  goto ; [100.00%]

and it is vectorized to

   [local count: 238585440]:
  # i_10 = PHI 
  # ivtmp_4 = PHI 
  # vectp_a.4_9 = PHI 
  # vectp_a.8_16 = PHI 
  # ivtmp_19 = PHI 
  vect__1.6_13 = MEM  [(int *)vectp_a.4_9];
  _1 = a[i_10];
  vect__2.7_15 = vect__1.6_13 + { 5, 5, 5, 5 };
  _2 = _1 + 5;
  MEM  [(int *)vectp_a.8_16] = vect__2.7_15;
  i_7 = i_10 + 1;
  ivtmp_3 = ivtmp_4 - 1;
  vectp_a.4_8 = vectp_a.4_9 + 16;
  vectp_a.8_17 = vectp_a.8_16 + 16;
  ivtmp_20 = ivtmp_19 + 1;
  if (ivtmp_20 < 2)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 119292723]:
  goto ; [100.00%]

This vectorized loop still read _1 from a[i_10] and adds 5 to it, so the second
loop iteration will add 5 to the value of a[1]. But the first iteration has
already added 5 to a[1], so we are now doing a different calculation compared
to the original loop, and this can overflow even if the original did not.

[Bug tree-optimization/106884] ifcombine may move shift so it shifts more than bitwidth

2023-08-05 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106884

--- Comment #6 from Krister Walfridsson  ---
One more similar case (that may be the same as comment #3):

int g;

void foo(int a, int b, int c, int d, int e)
{
  if ((10 + a) * b)
{
  g = (c || (g >> d)) << 1;
}
}

In this case, reassoc1 optimizes the IR for
  c || (g >> d)
to do 
  (c | (g >> d)) != 0
and we are now always doing the shift, even when c is true.

[Bug tree-optimization/110760] slp introduces new overflow arithmetic

2023-07-20 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110760

--- Comment #3 from Krister Walfridsson  ---
(In reply to Andrew Pinski from comment #1)
> I thought we decided that vector types don't apply the overflow rules and
> always just wrap ...

That makes sense. But on the other hand, PR 110495 is a similar issue, and that
was fixed...

And TYPE_OVERFLOW_WRAPS should return true for integer vectors if they always
wrap (or is it only valid for scalars? But ANY_INTEGRAL_TYPE_P is careful to
handle vectors and complex numbers too, so I thought the
ANY_INTEGRAL_TYPE_CHECK in TYPE_OVERFLOW_WRAPS means that it work for vectors
too).

[Bug tree-optimization/110760] New: slp introduces new wrapped arithmetic

2023-07-20 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110760

Bug ID: 110760
   Summary: slp introduces new wrapped arithmetic
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

Consider the following function from gcc.dg/vect/bb-slp-layout-5.c:

int a[4], b[4], c[4];

void f1()
{
  a[0] = b[3] - c[3];
  a[1] = b[2] + c[2];
  a[2] = b[1] - c[1];
  a[3] = b[0] + c[0];
}

This is vectorized by slp2:
  vector(4) int vect__1.5;
  vector(4) int vect__2.8;
  vector(4) int vect__12.10;
  vector(4) int vect__3.9;
  vector(4) int _22;
  vect__1.5_18 = MEM  [(int *)];
  vect__2.8_19 = MEM  [(int *)];
  vect__12.10_21 = vect__1.5_18 + vect__2.8_19;
  vect__3.9_20 = vect__1.5_18 - vect__2.8_19;
  _22 = VEC_PERM_EXPR ;
  MEM  [(int *)] = _22;

But this introduces new calculations in the temporary vectors of the unused
elements:
  b[0] - c[0];
  b[1] + c[1];
  b[2] - c[2];
  b[3] + c[3];
and these calculations may wrap for input where the original program did not
wrap.

[Bug tree-optimization/110554] New: more invalid wide Boolean values

2023-07-04 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110554

Bug ID: 110554
   Summary: more invalid wide Boolean values
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

The fix for PR 110487 improved the situation, but my tool still finds some
cases where GCC generates invalid  values.

One such case can be seen in gcc.c-torture/compile/pr104499.c:

  typedef int __attribute__((__vector_size__ (8 * sizeof (int V;

  V v;

  void
  foo (void)
  {
v = ((1 | v) != 1);
  }

Here veclower2 is introducing code

   _8;
   _10;
  ...
  gimple_assign 
  gimple_assign 


More examples of this failure can be seen in gcc.c-torture/compile/pr108237.c
and gcc.c-torture/compile/pr54713-1.c

[Bug tree-optimization/110541] New: Invalid VEC_PERM_EXPR mask element size

2023-07-04 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110541

Bug ID: 110541
   Summary: Invalid VEC_PERM_EXPR mask element size
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

tree.def says:
  The number of MASK elements must be the same with the
  number of elements in V0 and V1.  The size of the inner type
  of the MASK and of the V0 and V1 must be the same.

But tree-vectorizer creates permutations where the MASK element size is
different than for V0 and V1, such as

   vector(8) unsigned short _79;
   ...
  _79 = VEC_PERM_EXPR <_78, { 0, 0, 0, 0, 0, 0, 0, 0 }, { 4, 5, 6, 7, 8, 9, 10,
11 }>;

where the MASK elements are of a 64-bit type.

This can be seen when compiling the following function (from
gcc.c-torture/compile/2717-1.c) as "gcc -S -O3" for x86_64:

short
inner_product (short *a, short *b)
{
  int i;
  short sum = 0;

  for (i = 9; i >= 0; i--)
sum += (*a++) * (*b++);

  return sum;
}

[Bug tree-optimization/110495] New: fre introduces signed wrap for vector

2023-06-30 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110495

Bug ID: 110495
   Summary: fre introduces signed wrap for vector
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

The following function (from gcc.dg/tree-ssa/addadd-2.c)

typedef int S __attribute__((vector_size(64)));
void j(S*x){
  *x += __INT_MAX__;
  *x += __INT_MAX__;
}

is optimized by fre1 to 

void j (S * x)
{
  vector(16) int _1;
  vector(16) int _2;
  vector(16) int _4;

   :
  _1 = *x_6(D);
  _2 = _1 + { 2147483647, 2147483647, 2147483647, 2147483647, 2147483647,
2147483647, 2147483647, 2147483647, 2147483647, 2147483647, 2147483647,
2147483647, 2147483647, 2147483647, 2147483647, 2147483647 };
  *x_6(D) = _2;
  _4 = _1 + { -2(OVF), -2(OVF), -2(OVF), -2(OVF), -2(OVF), -2(OVF), -2(OVF),
-2(OVF), -2(OVF), -2(OVF), -2(OVF), -2(OVF), -2(OVF), -2(OVF), -2(OVF), -2(OVF)
};
  *x_6(D) = _4;
  return;
}

which has signed wrap for the cases where the original did not wrap.

[Bug tree-optimization/110487] New: invalid wide Boolean value

2023-06-29 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110487

Bug ID: 110487
   Summary: invalid wide Boolean value
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

The vrp2 pass generates IR where a  may get the value 1 (in
addition to the valid 0 and -1).

This can be seen in gcc.c-torture/compile/pr53410-1.c

  int *a, b, c, d;

  void
  foo (void)
  {
for (; d <= 0; d++)
  b &= ((a || d) ^ c) == 1;
  }

when compiled as "gcc -O3". The vectorizer has created (correct) code

  _Bool _16;
   _66;
  ...
  _16 = a.1_1 != 0B;
  _66 = _16 ? -1 : 0;

which then is transformed by vrp2 to

  _Bool _16;
   _38;
   _66;
  ...
  _16 = a.1_1 != 0B;
  _38 = () _16;
  _66 = -_38;

_16 can be both true/false depending on the values of some global variables, so
_38 has the value 0 or -1, and _66 has the value 0 or 1.

[Bug tree-optimization/110434] New: tree-nrv introduces incorrect CLOBBER(eol)

2023-06-27 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110434

Bug ID: 110434
   Summary: tree-nrv introduces incorrect CLOBBER(eol)
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

The tree-nrv pass may introduce incorrect CLOBBER(eol) of the form
   ={v} {CLOBBER(eol)};
  return ;

One example of this can be seen by compiling gcc.c-torture/execute/921204-1.c
for x86 using the flags "-O -m32", where it changes the IR

  union bu o;
  ...
  o = i;
  MEM[(union  *)].b18 = _11;
  MEM[(union  *)].b20 = _11;
   = o;
  o ={v} {CLOBBER(eol)};
  return ;

to just use  instead of o

  union bu o [value-expr: ];
  ...
   = i;
  MEM[(union  *)&].b18 = _11;
  MEM[(union  *)&].b20 = _11;
   ={v} {CLOBBER(eol)};
  return ;

so the CLOBBER(eol) now refers to .

[Bug tree-optimization/109626] New: forwprop introduces new signed multiplication UB

2023-04-25 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109626

Bug ID: 109626
   Summary: forwprop introduces new signed multiplication UB
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

Consider the function

int foo(_Bool v0, unsigned v1, unsigned v2)
{
  signed int v5 = v1 >> v2;
  unsigned v6 = -v1;
  unsigned int v7 = v2 - v0;
  return (int)v7 * (int)v6;
}

This does not invoke undefined behavior when called as foo(0, 0x8000, 1),
but forwprop1 optimizes this to the equivalent of

int foo(_Bool v0, unsigned v1, unsigned v2)
{
  signed int v5 = v1 >> v2;
  unsigned int v7 = v0 - v2;
  return (int)v7 * (int)v1;
}

where the signed multiplication now is calculating -1 * INT_MIN.

[Bug tree-optimization/108625] New: forwprop introduces new UB

2023-02-01 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108625

Bug ID: 108625
   Summary: forwprop introduces new UB
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

Consider the function

  unsigned char foo(int x)
  {
int t = -x;
unsigned char t1 = t;
unsigned char t2 = t;
return t1 + t2;
  }

This does not invoke undefined behavior when called as foo(0x4001),
but forwprop1 optimizes this to

  unsigned char foo (int x)
  {
int t;
unsigned char _5;
int _7;

 :
t_2 = -x_1(D);
_7 = t_2 - x_1(D);
_5 = (unsigned char) _7;
return _5;
  }

where _7 has signed overflow for x = 0x4001.

[Bug tree-optimization/108440] rotate optimization may introduce new UB

2023-01-17 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108440

--- Comment #4 from Krister Walfridsson  ---
I misread the comment -- it describes a possible future improvement (that I
believe is not allowed). But the committed patch seems to be correct.

[Bug tree-optimization/106523] [10/11/12 Regression] forwprop miscompile

2023-01-17 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106523

--- Comment #8 from Krister Walfridsson  ---
This fixed most of the rotate issues my translation validation tool found. I
assume the remaining issues are due to a different (but similar) bug, so I
opened Bug 108440 for those. 

But the issue in Bug 108440 seems similar to the "Y equal to B case" discussed
in comment #6, so I believe the comment is slightly wrong (as the rotate
instruction will invoke UB when Y is equal to B).

[Bug tree-optimization/108440] rotate optimization may introduce new UB

2023-01-17 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108440

--- Comment #3 from Krister Walfridsson  ---
Hmm. I think this is the "Y equal to B case" from bug 106523. I.e., the bugfix
is not correct...

[Bug tree-optimization/108440] rotate optimization may introduce new UB

2023-01-17 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108440

--- Comment #2 from Krister Walfridsson  ---
No, bug 106523 is a different issue (I have tested with a compiler that has
that fixed).

[Bug tree-optimization/108440] New: rotate optimization may introduce new UB

2023-01-17 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108440

Bug ID: 108440
   Summary: rotate optimization may introduce new UB
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

GCC optimizes shift instructions to rotate in a way that may make
the optimized IR invoke UB for cases where the original did not.
This can be seen in the IR for f5 from c-c++-common/rotate-1.c:

  unsigned short int
  f5 (unsigned short int x, unsigned int y)
  {
return (x << y) | (x >> (__CHAR_BIT__ * __SIZEOF_SHORT__ - y));
  }

The IR is doing 32-bit shifts, so y = 16 does not invoke UB:

  short unsigned int f5 (short unsigned int x, unsigned int y)
  {
int _1;
int _2;
signed short _3;
int _4;
unsigned int _5;
int _6;
signed short _7;
signed short _8;
short unsigned int _11;

 :
_1 = (int) x_9(D);
_2 = _1 << y_10(D);
_3 = (signed short) _2;
_4 = (int) x_9(D);
_5 = 16 - y_10(D);
_6 = _4 >> _5;
_7 = (signed short) _6;
_8 = _3 | _7;
_11 = (short unsigned int) _8;
return _11;
  }

But forwprop1 changes this to a 16-bit rotate which invokes UB for y=16:

  short unsigned int f5 (short unsigned int x, unsigned int y)
  {
short unsigned int _13;

 :
_13 = x_9(D) r<< y_10(D);
return _13;
  }

[Bug tree-optimization/106884] ifcombine may move shift so it shifts more than bitwidth

2022-09-30 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106884

--- Comment #3 from Krister Walfridsson  ---
A similar case is

int r1, r2;

int foo(int a, int s1, int s2)
{
  if (a & (1 << s1))
return r1;
  if (a & (1 << s2))
return r1;
  return r2;
}

where reassoc2 optimizes this to always shift by s2.

[Bug tree-optimization/106990] New: Missing TYPE_OVERFLOW_SANITIZED checks in match.pd

2022-09-20 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106990

Bug ID: 106990
   Summary: Missing TYPE_OVERFLOW_SANITIZED checks in match.pd
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

When UBSan is used, match.pd disables simplifications that can remove UB. But
two simplifications are missing TYPE_OVERFLOW_SANITIZED checks, making the two
tests below fail to report UB when compiled with -fsanitize=undefined.

/* (~X - ~Y) -> Y - X.  */
int main(void)
{
  volatile int x = -1956816001;
  volatile int y = 1999200512;
  return ~x - ~y;
}

/* -x & 1 -> x & 1.  */
int main(void)
{
  volatile int x = 0x8000;
  return -x & 1;
}

[Bug tree-optimization/106884] ifcombine may move shift so it shifts more than bitwidth

2022-09-08 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106884

--- Comment #2 from Krister Walfridsson  ---
This optimization is invalid if (int)1 << 33 is _not_ undefined behavior in
GIMPLE!

Consider an architecture where (int)1 << 33 evaluates to 0. foo(2, 1, 33)
evaluates to 0 for the original GIMPLE, but it evaluates to 2 in the optimized
IR.

[Bug sanitizer/106885] New: -(a-b) is folded to b-a before the UBSAN pass is run

2022-09-07 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106885

Bug ID: 106885
   Summary: -(a-b) is folded to b-a before the UBSAN pass is run
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: sanitizer
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org,
jakub at gcc dot gnu.org, kcc at gcc dot gnu.org, marxin at 
gcc dot gnu.org
  Target Milestone: ---

GCC is folding -(a-b) to b-a before the UBSAN pass is run, which may hide
undefined behavior from the sanitizer.

This can be seen by the following program, which invokes undefined behavior
that is not detected by -fsanitize=undefined

int main(void)
{
  volatile int a = 0;
  volatile int b = 0x8000;
  return -(a - b);
}

[Bug tree-optimization/106884] New: ifcombine may move shift so it shifts more than bitwidth

2022-09-07 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106884

Bug ID: 106884
   Summary: ifcombine may move shift so it shifts more than
bitwidth
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

The function foo from gcc.dg/tree-ssa/ssa-ifcombine-1.c can be called
as foo(1, 1, 33) without invoking undefined behavior

int foo (int x, int a, int b)
{
  int c = 1 << a;
  if (x & c)
if (x & (1 << b))
  return 2;
  return 0;
}

But ifcombine transforms this to

int foo (int x, int a, int b)
{
   int c;
   int _4;
   int _10;
   int _11;
   int _12;
   int _13;

   :
   _10 = 1 << b_8(D);
   _11 = 1 << a_5(D);
   _12 = _10 | _11;
   _13 = x_7(D) & _12;
   if (_12 == _13)
 goto ;
   else
 goto ;

   :

   :
   # _4 = PHI <2(3), 0(2)>
   return _4;
}

and this will now calculate 1 << 33 unconditionally for _10.

[Bug tree-optimization/106883] New: SLSR may generate signed wrap

2022-09-07 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106883

Bug ID: 106883
   Summary: SLSR may generate signed wrap
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

SLSR may generate new signed wrap for cases where the original did not wrap.
This can be seen in the function f from gcc.dg/tree-ssa/slsr-19.c:

int
f (int c, int s)
{
  int x1, x2, y1, y2;

  y1 = c + 2;
  x1 = s * y1;
  y2 = y1 + 2;
  x2 = s * y2;
  return x1 + x2;
}

SLSR optimizes this to

int f (int c, int s)
{
   int y1;
   int x2;
   int x1;
   int _7;
   int slsr_9;

   :
   y1_2 = c_1(D) + 2;
   x1_4 = y1_2 * s_3(D);
   slsr_9 = s_3(D) * 2;
   x2_6 = x1_4 + slsr_9;
   _7 = x1_4 + x2_6;
   return _7;

Calling f(-3, 0x75181005) does not make any operation wrap in the original
function, but slsr_9 overflow in the optimized code.

[Bug tree-optimization/106744] New: phiopt miscompiles min/max

2022-08-25 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106744

Bug ID: 106744
   Summary: phiopt miscompiles min/max
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

GCC miscompiles the following test at -O1 or higher optimization levels:

#include 

__attribute__((noinline)) uint8_t
three_minmax1 (uint8_t xc, uint8_t xm, uint8_t xy) {
  uint8_t  xk;
  if (xc > xm) {
xk = (uint8_t) (xc < xy ? xc : xy);
  } else {
xk = (uint8_t) (xm < xy ? xm : xy);
  }
  return xk;
}

int
main (void)
{
  volatile uint8_t xy = 255;
  volatile uint8_t xm = 0;
  volatile uint8_t xc = 255;
  if (three_minmax1 (xc, xm, xy) != 255)
__builtin_abort ();
  return 0;
}

What is happening is that phiopt transforms three_minmax1 to

  _7 = MAX_EXPR ;
  _9 = MIN_EXPR <_7, xm_3(D)>;
  return _9;

instead of the intended

  _7 = MAX_EXPR ;
  _9 = MIN_EXPR <_7, xy_4(D)>;
  return _9;

[Bug tree-optimization/106523] New: forwprop miscompile

2022-08-04 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106523

Bug ID: 106523
   Summary: forwprop miscompile
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

The function f7 from testsuite/c-c++-common/rotate-2.c is miscompiled by
forwprop. This can be seen by running the function as

__attribute__((noinline)) unsigned char
f7 (unsigned char x, unsigned int y)
{
  unsigned int t = x;
  return (t << y) | (t >> ((-y) & 7));
}

int
main (void)
{
  volatile unsigned char x = 152;
  volatile unsigned int y = 19;
  if (f7(x, y) != 4)
__builtin_abort ();

  return 0;
}

This fails at -O1 and higher optimization levels.

What is happening here is that forwprop1 has optimized the function
to
  _10 = x_7(D) r<< y_9(D);
  return _10;

[Bug tree-optimization/106513] bswap is incorrectly generated

2022-08-03 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106513

--- Comment #2 from Krister Walfridsson  ---
(In reply to Andreas Schwab from comment #1)
> This subexpression has undefined behaviour: (((int64_t) 0xff) << 56).

I thought that was allowed in GCC as the manual says
(https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc/Integers-implementation.html#Integers-implementation)
"As an extension to the C language, GCC does not use the latitude given in C99
and C11 only to treat certain aspects of signed ‘<<’ as undefined."

If not, what behavior does the manual refer to?

[Bug tree-optimization/106513] New: bswap is incorrectly generated

2022-08-03 Thread kristerw at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106513

Bug ID: 106513
   Summary: bswap is incorrectly generated
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

GCC may incorrectly generate bswap instructions for code not doing a correct
swap. This can be seen by running the function from testsuite/gcc.dg/pr40501.c
as

typedef long int int64_t;

__attribute__((noinline)) int64_t
swap64 (int64_t n)
{
  return (((n & (((int64_t) 0xff) )) << 56) |
  ((n & (((int64_t) 0xff) << 8)) << 40) |
  ((n & (((int64_t) 0xff) << 16)) << 24) |
  ((n & (((int64_t) 0xff) << 24)) << 8) |
  ((n & (((int64_t) 0xff) << 32)) >> 8) |
  ((n & (((int64_t) 0xff) << 40)) >> 24) |
  ((n & (((int64_t) 0xff) << 48)) >> 40) |
  ((n & (((int64_t) 0xff) << 56)) >> 56));
}

int main (void)
{
  volatile int64_t n = 0x8000l;

  if (swap64(n) != 0xff80l)
__builtin_abort ();

  return 0;
}

This fails at -Os and higher optimization levels.

[Bug tree-optimization/85762] New: [8/9 Regression] range-v3 abstraction overhead not optimized away

2018-05-12 Thread kristerw at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85762

Bug ID: 85762
   Summary: [8/9 Regression] range-v3 abstraction overhead not
optimized away
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

Created attachment 44124
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44124=edit
preprocessed source code for run_range()

GCC 8 is less aggressive than earlier versions when eliminating abstraction
overhead in the range-v3 library, which can be seen with the function

  #include 
  #include 

  long run_range(std::vector const , long to_find)
  {
auto const found_index = ranges::distance(lengths
| ranges::view::transform(ranges::convert_to{})
| ranges::view::partial_sum()
| ranges::view::take_while([=](auto const i) {
  return !(to_find < i);
  }));
return found_index;
  }


GCC 7 compiled the loop to

   [10.87%]:
  # it$_M_current_41 = PHI <_6(4), _27(8)>
  # it$16_26 = PHI <it$16_24(4), _31(8)>
  _53 = to_find_2(D) < it$16_26;

   [100.00%]:
  # it$_M_current_23 = PHI <it$_M_current_41(5), _27(7)>
  _20 = _7 == it$_M_current_23;
  _5 = _20 | _53;
  if (_5 != 0)
goto ; [7.36%]
  else
goto ; [92.64%]

   [92.60%]:
  _27 = it$_M_current_23 + 4;
  if (_7 != _27)
goto ; [3.75%]
  else
goto ; [96.25%]

   [3.47%]:
  _29 = MEM[(const int &)it$_M_current_23 + 4];
  _30 = (long int) _29;
  _31 = it$16_26 + _30;
  goto ; [100.00%]

   [7.36%]:
  _33 = (long int) it$_M_current_23;
  _34 = (long int) _6;
  _35 = _33 - _34;
  _36 = _35 /[ex] 4;
  return _36;

while the loop compiled by GCC 8 updates some structures in each iteration

   [local count: 1478210893]:
  # it_47 = PHI <SR.352_183(4), _64(8)>
  # it$16$sum__115 = PHI <SR.353_184(4), _67(8)>
  _42 = to_find_2(D) < it$16$sum__115;

   [local count: 1651554780]:
  # it_30 = PHI <it_47(5), _64(7)>
  _46 = it_30 == SR.355_137;
  _40 = _42 | _46;
  if (_40 != 0)
goto ; [65.00%]
  else
goto ; [35.00%]

   [local count: 577812955]:
  SR.80_62 = MEM[(const struct __normal_iterator &)SR.354_185 + 24];
  MEM[(struct adaptor_cursor *)] = SR.80_62;
  MEM[(struct box *)].value = pos;
  SR.396_209 = MEM[(struct adaptor_cursor *)];
  _64 = it_30 + 4;
  if (_64 != SR.396_209)
goto ; [70.00%]
  else
goto ; [30.00%]

   [local count: 404469068]:
  _65 = MEM[(const int &)it_30 + 4];
  _66 = (long int) _65;
  _67 = _66 + it$16$sum__115;
  goto ; [100.00%]

   [local count: 1073279389]:
  _32 = it_30 - SR.352_183;
  _33 = _32 /[ex] 4;
  D.357125 ={v} {CLOBBER};
  D.311383 ={v} {CLOBBER};
  return _33;

which makes this loop about 10x slower on my computer.

GCC 8 also generates lots of code setting up the function that GCC 7 manages to
eliminate.


This regression was introduced by r255510:

  2017-12-08  Martin Jambor  <mjam...@suse.cz>

PR tree-optimization/83141
* tree-sra.c (contains_vce_or_bfcref_p): Move up in the file, also
test for MEM_REFs implicitely changing types with padding.  Remove
inline keyword.
(build_accesses_from_assign): Added contains_vce_or_bfcref_p checks.


To reproduce the problem, compile the attached file as

  g++ -O2 -S ranges.ii

and notice the difference in the generated code.

[Bug rtl-optimization/85594] New: ICE during expand when compiling with -fwrapv -fopenmp

2018-05-01 Thread kristerw at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85594

Bug ID: 85594
   Summary: ICE during expand when compiling with -fwrapv -fopenmp
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

Compiling gcc/testsuite/gcc.dg/gomp/pr81768-2.c with "-fwrapv -fopenmp" fails
with an ICE:

> gcc -S -fwrapv -fopenmp pr81768-2.c 
during RTL pass: expand
../pr81768-2.c: In function 'foo._omp_fn.1':
../pr81768-2.c:10:9: internal compiler error: in make_decl_rtl, at
varasm.c:1322
 #pragma omp target parallel for schedule(static, 32) collapse(3)
 ^~~
0x5d230c make_decl_rtl(tree_node*)
../../gcc/gcc/varasm.c:1318
0x7c79bc expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
../../gcc/gcc/expr.c:9965
0x7d05de expand_expr
../../gcc/gcc/expr.h:280
0x7d05de expand_expr_addr_expr_1
../../gcc/gcc/expr.c:7946
0x7d0465 expand_expr_addr_expr_1
../../gcc/gcc/expr.c:7992
0x7c698d expand_expr_addr_expr
../../gcc/gcc/expr.c:8067
0x7c698d expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
../../gcc/gcc/expr.c:11239
0x7433a0 expand_normal
../../gcc/gcc/expr.h:286
0x7433a0 do_compare_and_jump
../../gcc/gcc/dojump.c:1196
0x744253 do_jump_1(tree_code, tree_node*, tree_node*, rtx_code_label*,
rtx_code_label*, profile_probability)
../../gcc/gcc/dojump.c:261
0x6dc3cc expand_gimple_cond
../../gcc/gcc/cfgexpand.c:2495
0x6dc3cc expand_gimple_basic_block
../../gcc/gcc/cfgexpand.c:5674
0x6dff66 execute
../../gcc/gcc/cfgexpand.c:6425
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

[Bug tree-optimization/85588] New: -fwrapv miscompilation

2018-05-01 Thread kristerw at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85588

Bug ID: 85588
   Summary: -fwrapv miscompilation
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

GCC miscompiles gcc/testsuite/gcc.dg/torture/pr57656.c when using -fwrapv

  > gcc -fwrapv pr57656.c
  > ./a.out
  Abort (core dumped)

The problem seems to be exactly the same as in PR57656 (but when using
-fwrapv):
  t = 1 - ((a - b) / c);
is changed to
  t = (b - a) / c + 1;
which is not the same in this case where both (a - b) and (b - a) have the
value 0x8000.

This fails in GCC 6 and newer versions. Compiling using GCC 5 produces the
correct result.

[Bug c/82296] Warn for code removal due to "code never accesses array out of bounds" assumption

2017-10-10 Thread kristerw at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82296

Krister Walfridsson  changed:

   What|Removed |Added

 CC||kristerw at gcc dot gnu.org

--- Comment #7 from Krister Walfridsson  ---
The C89 rules are the same as for C11 -- you can find the relevant text in C90
6.3.6 (it does not cover the "UB 62" from the ARR30-C page, but that is because
C89 does not have flexible array members...)

Using -std=c89 will compile following the rules in C89, so you will not suffer
from new undefined behaviors introduced in newer standards.

[Bug target/77480] netbsd specfile will not link against libc when building -shared (+patch)

2017-09-29 Thread kristerw at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77480

Krister Walfridsson  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Krister Walfridsson  ---
Fixed for trunk and GCC 7.3.  Closing this bug as I'm not planning to backport
to GCC 6.

[Bug target/77480] netbsd specfile will not link against libc when building -shared (+patch)

2017-09-29 Thread kristerw at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77480

--- Comment #3 from Krister Walfridsson  ---
Author: kristerw
Date: Fri Sep 29 21:34:00 2017
New Revision: 253309

URL: https://gcc.gnu.org/viewcvs?rev=253309=gcc=rev
Log:
2017-09-29  Krister Walfridsson  

Backport from mainline
2017-06-29  Maya Rashish  

PR target/77480
* config/netbsd.h (NETBSD_LIB_SPEC): Add -lc when creating shared
objects.

Modified:
branches/gcc-7-branch/gcc/ChangeLog
branches/gcc-7-branch/gcc/config/netbsd.h

[Bug target/39570] cabs and cabsf are named differently on NetBSD 5

2017-09-29 Thread kristerw at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39570

--- Comment #14 from Krister Walfridsson  ---
Author: kristerw
Date: Fri Sep 29 09:38:08 2017
New Revision: 253283

URL: https://gcc.gnu.org/viewcvs?rev=253283=gcc=rev
Log:
2017-09-29  Krister Walfridsson  

Backport from mainline
2017-09-26  Krister Walfridsson  

PR target/39570
* gcc/config/netbsd-protos.h: New file.
* gcc/config/netbsd.c: New file.
* gcc/config/netbsd.h (SUBTARGET_INIT_BUILTINS): Define.
* gcc/config/t-netbsd: New file.
* gcc/config.gcc (tm_p_file): Add netbsd-protos.h.
(tmake_file) Add t-netbsd.
(extra_objs) Add netbsd.o.

Added:
branches/gcc-7-branch/gcc/config/netbsd-protos.h
branches/gcc-7-branch/gcc/config/netbsd.c
branches/gcc-7-branch/gcc/config/t-netbsd
Modified:
branches/gcc-7-branch/gcc/ChangeLog
branches/gcc-7-branch/gcc/config.gcc
branches/gcc-7-branch/gcc/config/netbsd.h

[Bug target/77480] netbsd specfile will not link against libc when building -shared (+patch)

2017-09-28 Thread kristerw at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77480

Krister Walfridsson  changed:

   What|Removed |Added

 CC||kristerw at gcc dot gnu.org

--- Comment #2 from Krister Walfridsson  ---
Fixed on trunk by r249822

[Bug target/80600] hidden symbol `__cpu_model' is referenced by DSO

2017-09-28 Thread kristerw at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80600

Krister Walfridsson  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |FIXED

--- Comment #13 from Krister Walfridsson  ---
Fixed for trunk and GCC 7.3 (GCC 6 and 5 does not have this problem).

[Bug target/80600] hidden symbol `__cpu_model' is referenced by DSO

2017-09-28 Thread kristerw at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80600

--- Comment #12 from Krister Walfridsson  ---
Author: kristerw
Date: Thu Sep 28 19:17:51 2017
New Revision: 253263

URL: https://gcc.gnu.org/viewcvs?rev=253263=gcc=rev
Log:
gcc/ChangeLog:

Backport from mainline
2017-05-14  Krister Walfridsson  

PR target/80600
* config/netbsd.h (NETBSD_LIBGCC_SPEC): Always add -lgcc.

libgcc/ChangeLog:

Backport from mainline
2017-05-14  Krister Walfridsson  

PR target/80600
* config.host (*-*-netbsd*): Add t-slibgcc-libgcc to tmake_file.

Modified:
branches/gcc-7-branch/gcc/ChangeLog
branches/gcc-7-branch/gcc/config/netbsd.h
branches/gcc-7-branch/libgcc/ChangeLog
branches/gcc-7-branch/libgcc/config.host

[Bug target/39570] cabs and cabsf are named differently on NetBSD 5

2017-09-26 Thread kristerw at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39570

--- Comment #13 from Krister Walfridsson  ---
Author: kristerw
Date: Tue Sep 26 21:26:21 2017
New Revision: 253216

URL: https://gcc.gnu.org/viewcvs?rev=253216=gcc=rev
Log:
2017-09-26  Krister Walfridsson  

PR target/39570
* gcc/config/netbsd-protos.h: New file.
* gcc/config/netbsd.c: New file.
* gcc/config/netbsd.h (SUBTARGET_INIT_BUILTINS): Define.
* gcc/config/t-netbsd: New file.
* gcc/config.gcc (tm_p_file): Add netbsd-protos.h.
(tmake_file) Add t-netbsd.
(extra_objs) Add netbsd.o.

Added:
trunk/gcc/config/netbsd-protos.h
trunk/gcc/config/netbsd.c
trunk/gcc/config/t-netbsd
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config.gcc
trunk/gcc/config/netbsd.h

[Bug middle-end/82177] Alias analysis too aggressive with integer-to-pointer cast

2017-09-20 Thread kristerw at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82177

Krister Walfridsson  changed:

   What|Removed |Added

 CC||kristerw at gcc dot gnu.org

--- Comment #5 from Krister Walfridsson  ---
Did you mean PR61502 - "== comparison on "one-past" pointer gives wrong
result"?

[Bug tree-optimization/81554] New: [8 Regression] 25% performance regression in Himeno benchmark

2017-07-25 Thread kristerw at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81554

Bug ID: 81554
   Summary: [8 Regression] 25% performance regression in Himeno
benchmark
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

Created attachment 41831
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41831=edit
The Himeno benchmark

The Himeno benchmark from the Phoronix test suite lost 25% of its performance
by r248771 that fixed PR 66313 ("Unsafe factorization of a*b+a*c").

The benchmark is attached, can be compiled as
  gcc -O3 himenobmtxpa.c
and run as
  ./a.out s

I see 15% slowdown when the benchmark is compiled as "-O3" and 25% if compiled
as "-O3 -march=native" on a Broadwell CPU.

[Bug tree-optimization/81409] New: Inefficient loops generated from range-v3 code

2017-07-12 Thread kristerw at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81409

Bug ID: 81409
   Summary: Inefficient loops generated from range-v3 code
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

Created attachment 41728
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41728=edit
Preprocessed file for run_range()

The range-v3 (https://github.com/ericniebler/range-v3) function
  long run_range(std::vector const , long to_find)
  {
auto const found_index = ranges::distance(lengths
| ranges::view::transform(ranges::convert_to{})
| ranges::view::partial_sum()
| ranges::view::take_while([=](auto const i) {
  return !(to_find < i);
  }));
return found_index;
  }
is generated as slow code with GCC, needing 3x the time to run compared to the
code generated by LLVM (when compiled with "-O3 -std=c++14 -DNDEBUG"). The
calculation done in run_range() is the equivalent of
  long run_forloop(std::vector const , long to_find)
  {
long len = vec.end() - vec.begin();
const int *p = [0];
long i, acc = 0;
for (i = 0; i < len; i++) {
  acc += p[i];
  if (to_find < acc)
  break;
}
return i;
  }
and LLVM manages to generate similar code for both functions, while GCC seems
to be confused by the run_range() loop and generates extra comparisions and a
somewhat messy code flow...

[Bug target/80600] hidden symbol `__cpu_model' is referenced by DSO

2017-05-14 Thread kristerw at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80600

--- Comment #11 from Krister Walfridsson  ---
Author: kristerw
Date: Sun May 14 22:49:03 2017
New Revision: 248037

URL: https://gcc.gnu.org/viewcvs?rev=248037=gcc=rev
Log:
PR target/80600 - hidden symbol '__cpu_model' is referenced by DSO

gcc/ChangeLog:

PR target/80600
* config/netbsd.h (NETBSD_LIBGCC_SPEC): Always add -lgcc.

libgcc/ChangeLog:

PR target/80600
* config.host (*-*-netbsd*): Add t-slibgcc-libgcc to tmake_file.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/netbsd.h
trunk/libgcc/ChangeLog
trunk/libgcc/config.host

63 matches

Mail list logo