[Bug tree-optimization/94782] Simple multiplication-related arithmetic not optimized to direct multiplication

2023-02-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94782

--- Comment #4 from Andrew Pinski  ---
The RTL level for x86_64 was fixed with
r11-6456-g4615cde5d7ef281d4b554df411f82ad707f0a54d (aka PR 98334).

[Bug tree-optimization/94782] Simple multiplication-related arithmetic not optimized to direct multiplication

2023-02-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94782

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||TREE

--- Comment #3 from Andrew Pinski  ---
It is fixed on the RTL level but not the gimple level.


Combine does it for x86_64:
Trying 7, 8 -> 9:
7: {r91:SI=r93:SI-0x1;clobber flags:CC;}
  REG_DEAD r93:SI
  REG_UNUSED flags:CC
8: {r92:SI=r91:SI*r94:SI;clobber flags:CC;}
  REG_UNUSED flags:CC
  REG_DEAD r91:SI
9: r90:SI=r92:SI+r94:SI
  REG_DEAD r94:SI
  REG_DEAD r92:SI
Successfully matched this instruction:
(set (reg:SI 90)
(mult:SI (reg:SI 93)
(reg:SI 94)))
allowing combination of insns 7, 8 and 9
original costs 4 + 12 + 4 = 20
replacement cost 12

But it fails to do it on aarch64:

Trying 7 -> 14:
7: r101:SI=r103:SI-0x1
  REG_DEAD r103:SI
   14: x0:SI=r101:SI*r104:SI+r104:SI
  REG_DEAD r101:SI
  REG_DEAD r104:SI

[Bug tree-optimization/94782] Simple multiplication-related arithmetic not optimized to direct multiplication

2023-02-17 Thread gabravier at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94782

--- Comment #2 from Gabriel Ravier  ---
Appears to be fixed on trunk.

[Bug tree-optimization/94782] Simple multiplication-related arithmetic not optimized to direct multiplication

2020-04-27 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94782

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2020-04-27

--- Comment #1 from Richard Biener  ---
Since the inner (a - 1U) * b is unsigned but a * b would be signed due to
undefined signed overflow we cannot optimize to that.  But we could indeed.
optimize to (unsigned)a * (unsigned)b.  fold-const.c contains related
transforms that could be amended.  reassoc could as well but would need
enhancement for signed arithmetic.