[Bug rtl-optimization/64537] Aarch64 redundant sxth instruction gets generated

2021-08-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64537

Andrew Pinski  changed:

   What|Removed |Added

 Status|RESOLVED|NEW
 Resolution|FIXED   |---
   Target Milestone|9.0 |---

--- Comment #8 from Andrew Pinski  ---
Well it was just this case that was fixed.
here is another one which is still broken:
unsigned int
adds_shift_ext ( unsigned long long a, unsigned short b, unsigned c)
{
 unsigned long long  d = (a - ((unsigned long long)b << 3));

  if (d == 0)
return a + c + b;
  else
return b + d + c;
}

Note I think there is a missed reassociation/code hoisting too.

   [local count: 536870913]:
  _3 = (unsigned int) a_11(D);
  _4 = _3 + c_13(D);
  _15 = _4 + _8;
  goto ; [100.00%]

   [local count: 536870913]:
  _7 = (unsigned int) d_12;
  _17 = _8 + c_13(D);
  _14 = _7 + _17;

c_13(D) + _8 is full redundant here

[Bug rtl-optimization/64537] Aarch64 redundant sxth instruction gets generated

2021-08-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64537

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |FIXED
   Target Milestone|--- |9.0
 Status|NEW |RESOLVED

--- Comment #7 from Andrew Pinski  ---
GCC9+ does:
subsx3, x0, x1, sxth 3
add w1, w2, w1, sxth
add w1, w1, w3
add w0, w2, w0
cselw0, w1, w0, ne
ret

GCC 8 produced:
sxthw1, w1
subsx3, x0, x1, sxth 3
add w1, w1, w2
add w1, w1, w3
add w0, w2, w0
cselw0, w1, w0, ne
ret

GCC 9's combine is able to do this:
Trying 3 -> 8:
3: r99:SI=sign_extend(x1:HI)
  REG_DEAD x1:HI
8: r101:DI=sign_extend(r99:SI#0)
Failed to match this instruction:
(parallel [
(set (reg:DI 101 [ b ])
(sign_extend:DI (reg:HI 1 x1 [ b ])))
(set (reg/v:SI 99 [ b ])
(sign_extend:SI (reg:HI 1 x1 [ b ])))
])
Failed to match this instruction:
(parallel [
(set (reg:DI 101 [ b ])
(sign_extend:DI (reg:HI 1 x1 [ b ])))
(set (reg/v:SI 99 [ b ])
(sign_extend:SI (reg:HI 1 x1 [ b ])))
])
Successfully matched this instruction:
(set (reg/v:SI 99 [ b ])
(sign_extend:SI (reg:HI 1 x1 [ b ])))
Successfully matched this instruction:
(set (reg:DI 101 [ b ])
(sign_extend:DI (reg:HI 1 x1 [ b ])))
allowing combination of insns 3 and 8
original costs 4 + 4 = 8
replacement costs 4 + 4 = 8

So fixed by r9-2064.

[Bug rtl-optimization/64537] Aarch64 redundant sxth instruction gets generated

2016-08-12 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64537

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug rtl-optimization/64537] Aarch64 redundant sxth instruction gets generated

2015-01-28 Thread ramana at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64537

Ramana Radhakrishnan ramana at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2015-01-28
 CC||ramana at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #6 from Ramana Radhakrishnan ramana at gcc dot gnu.org ---
(In reply to kugan from comment #5)
 Is this sort of multiple-use potential candidate for ree pass? Haven't
 looked ree in detail yet.

IIRC REE can deal with multiple-use potential candidates, however it probably
needs some work to look into the complex zero/sign-extension patterns that the
AArch64 port has. This is certainly not 5.0 material though now :)


[Bug rtl-optimization/64537] Aarch64 redundant sxth instruction gets generated

2015-01-18 Thread kugan at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64537

--- Comment #5 from kugan at gcc dot gnu.org ---
Is this sort of multiple-use potential candidate for ree pass? Haven't looked
ree in detail yet.


[Bug rtl-optimization/64537] Aarch64 redundant sxth instruction gets generated

2015-01-08 Thread kugan at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64537

kugan at gcc dot gnu.org changed:

   What|Removed |Added

 CC||kugan at gcc dot gnu.org

--- Comment #1 from kugan at gcc dot gnu.org ---
According to AAPCS64
(http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf),
the unused parm register bits have unspecified value.So I think it is
needede.


[Bug rtl-optimization/64537] Aarch64 redundant sxth instruction gets generated

2015-01-08 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64537

--- Comment #2 from Andrew Pinski pinskia at gcc dot gnu.org ---
(In reply to kugan from comment #1)
 According to AAPCS64
 (http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/
 IHI0055C_beta_aapcs64.pdf), the unused parm register bits have unspecified
 value.So I think it is Needed

It is not needed because the next instruction has a sign extend and the other
uses of the result of the sign extend only use the lower 32bits of the
register.


[Bug rtl-optimization/64537] Aarch64 redundant sxth instruction gets generated

2015-01-08 Thread kugan at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64537

--- Comment #3 from kugan at gcc dot gnu.org ---
But isn't w1 is passed with 16bit value (short b) here. Am  I missing something
here?


[Bug rtl-optimization/64537] Aarch64 redundant sxth instruction gets generated

2015-01-08 Thread rearnsha at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64537

--- Comment #4 from Richard Earnshaw rearnsha at gcc dot gnu.org ---
b is used twice, once shifted left by 3 and once directly.

We could write this as

subsx3, x0, x1, sxth 3 
beq .L5
add w0, w2, w1, sxth  = Now extended
add w0, w0, w3
ret
.p2align 2
.L5:
add w0, w2, w0
ret

which in this specific case would perhaps be more efficient, but in practice
it's quite hard to get this sort of multiple-use right.

I think this is a special case, however, of the more common 'un-cse' type of
problem, where multiple uses of an extended (or shifted) value are always
commoned up.

Note that modern CPUs may take an extra cycle to perform an ALU-with-shift type
operation, eliminating the benefit of sinking multiple uses down into the ALU
operations themselves.