[Bug tree-optimization/102392] Failure to optimize a sign extension to a zero extension

2024-07-16 Thread gabravier at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102392

Gabriel Ravier  changed:

   What|Removed |Added

 Target|X86_64-linux-gnu|x86_64-linux-gnu
Version|12.0|15.0

--- Comment #5 from Gabriel Ravier  ---
I've wound up stumbling upon a very similar bug (which I think is the same bug
at its core) while examining the following code:

static uint32_t f(int8_t x)
{
return (~(uint32_t)x) & 1;
}

void floop(uint32_t *r, int8_t *x, size_t n)
{
#ifndef __clang__
_Pragma("GCC unroll 0") _Pragma("GCC novector")
#else
_Pragma("clang loop unroll(disable) vectorize(disable)")
#endif
for (size_t i = 0; i < n; ++i)
r[i] = f(x[i]);
}

where for the loop, GCC generates:

.L3:
  movsx eax, BYTE PTR [rsi+rdx]# <--- sign extension
  not eax
  and eax, 1
  mov DWORD PTR [rdi+rdx*4], eax
  add rdx, 1
  cmp rcx, rdx
  jne .L3

whereas LLVM manages:

.LBB0_2: # =>This Inner Loop Header: Depth=1
  movzx ecx, byte ptr [rsi + rax]   # <--- zero extension
  not ecx
  and ecx, 1
  mov dword ptr [rdi + 4*rax], ecx
  inc rax
  cmp rdx, rax
  jne .LBB0_2

which makes LLVM's output slightly faster (according to llvm-mca) for the same
reasons (i.e. lack of conversion from sign extension to zero extension).

[Bug tree-optimization/102392] Failure to optimize a sign extension to a zero extension

2021-09-20 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102392

--- Comment #4 from Richard Biener  ---
Note w/o explicit ZEXT_EXPR / SEXT_EXPR the GIMPLE for non-"natural" extensions
is more costly (more stmts) than the "natural" extension based on the sign
of the object because it requires an intermediate sign changing conversion.

[Bug tree-optimization/102392] Failure to optimize a sign extension to a zero extension

2021-09-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102392

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement
 Ever confirmed|0   |1
   Last reconfirmed||2021-09-18
 Status|UNCONFIRMED |NEW
  Component|target  |tree-optimization
   Keywords|ABI |

--- Comment #3 from Andrew Pinski  ---
A better testcase is:
void g(int32_t *x, int64_t *y)
{
if (*x < 0)
__builtin_unreachable();
*y = (*x);
}

To see what LLVM is really doing.
GCC:
movslq  (%rdi), %rax
movq%rax, (%rsi)
ret

VS LLVM:
movl(%rdi), %eax
movq%rax, (%rsi)
retq

There might be a dup of this bug somewhere too.