[Bug tree-optimization/104357] [Aarch64] Failure to use csinv instead of mvn+csel where possible

2022-02-02 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104357

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement
   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #2 from Andrew Pinski  ---
One thing I should note:
  _7 = x_3(D) >= 0;
  _6 = (unsigned char) _7;
  _8 = -_6;

Should be done on the gimple level as:
t = x_3(D) >> (sizeof(x_3(D))*8 - 1)
_8 = (unsigned char)t;

And then we can factor out the cast and I think it will produce the same code.

And yes it does, that is:
unsigned char stbi__clamp(int x)
{
   int t = x;
   if ((unsigned)x > 255) {
  t = x >> 31;
   }
   return t;
}

So Mine for GCC 13.

[Bug tree-optimization/104357] [Aarch64] Failure to use csinv instead of mvn+csel where possible

2022-02-02 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104357

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2022-02-02
  Component|target  |tree-optimization
 Ever confirmed|0   |1

--- Comment #1 from Andrew Pinski  ---
This will get GCC closer to what clang/LLVM produces:
unsigned char stbi__clamp(int x)
{
   int t = x;
   if ((unsigned)x > 255) {
  if (x < 0) t =  0;
  else if (x > 255) t =  -1;
   }
   return t;
}

 CUT 
The zero-extends are due to the cast not being outside of the csel and the RTL
level is not really good at cross bb optimizations.
The gimple level looks like:
   [local count: 1073741824]:
  x.0_1 = (unsigned int) x_3(D);
  if (x.0_1 > 255)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 536870913]:
  _7 = x_3(D) >= 0;
  _6 = (unsigned char) _7;
  _8 = -_6;
  goto ; [100.00%]

   [local count: 536870913]:
  _4 = (unsigned char) x_3(D);

   [local count: 1073741824]:
  # _2 = PHI <_8(3), _4(4)>
  return _2;

Which in theory could be improved to the what I gave above.
The gimple level has no knowledge of the rtl/target level that to do - in
unsigned, you need to a zero extend still.