[Bug rtl-optimization/111835] Suboptimal codegen: zero extended load instead of sign extended one

2023-11-01 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111835

--- Comment #4 from Andrew Pinski  ---
(In reply to Siarhei Volkau from comment #3)
> I don't think that it is duplicate of the bug 104387 because there's only
> one store.
> And this bug is simply disappears if we change the source code a bit.
> e.g.
>  - change (int8_t)*src; to *(int8_t*)src;
> or change argument uint8_t * dst to int8_t * dst
> 
> But if we have multiple stores, extension will remain in any condition.

One store but 2 uses.

[Bug rtl-optimization/111835] Suboptimal codegen: zero extended load instead of sign extended one

2023-11-01 Thread lis8215 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111835

--- Comment #3 from Siarhei Volkau  ---
I don't think that it is duplicate of the bug 104387 because there's only one
store.
And this bug is simply disappears if we change the source code a bit.
e.g.
 - change (int8_t)*src; to *(int8_t*)src;
or change argument uint8_t * dst to int8_t * dst

But if we have multiple stores, extension will remain in any condition.

[Bug rtl-optimization/111835] Suboptimal codegen: zero extended load instead of sign extended one

2023-10-31 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111835

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|NEW |RESOLVED

--- Comment #2 from Andrew Pinski  ---
Dup of bug 104387.

*** This bug has been marked as a duplicate of bug 104387 ***

[Bug rtl-optimization/111835] Suboptimal codegen: zero extended load instead of sign extended one

2023-10-16 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111835

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2023-10-16
  Component|middle-end  |rtl-optimization
 Ever confirmed|0   |1
   Severity|normal  |enhancement
 Status|UNCONFIRMED |NEW
 Target||aarch64
   Keywords||missed-optimization

--- Comment #1 from Andrew Pinski  ---
So as you said it depends on the target.
Most non-x86 target have:
/* Define if loading from memory in MODE, an integral mode narrower than
   BITS_PER_WORD will either zero-extend or sign-extend.  The value of this
   macro should be the code that says which one of the two operations is
   implicitly done, or UNKNOWN if none.  */
#define LOAD_EXTEND_OP(MODE) ZERO_EXTEND

defined.

Which causes REE to be confused before hand:
Before REE:
(insn 7 10 9 2 (set (reg:SI 0 x0 [orig:92 _1 ] [92])
(zero_extend:SI (mem:QI (reg:DI 0 x0 [99]) [0 *src_3(D)+0 S1 A8])))
"/app/example.cpp":4:39 146 {*zero_extendqisi2_aarch64}
 (nil))
(insn 9 7 15 2 (set (mem:QI (reg:DI 1 x1 [100]) [0 *dst_5(D)+0 S1 A8])
(reg:QI 0 x0 [orig:92 _1 ] [92])) "/app/example.cpp":5:10 62
{*movqi_aarch64}
 (nil))
(insn 15 9 16 2 (set (reg/i:SI 0 x0)
(sign_extend:SI (reg:QI 0 x0 [orig:92 _1 ] [92])))
"/app/example.cpp":7:1 142 {*extendqisi2_aarch64}
 (nil))

Which means that REE does not elimite it.


Note on x86 we get before REE:
(insn 7 4 8 2 (set (reg:QI 0 ax [orig:98 _1 ] [98])
(mem:QI (reg:DI 5 di [104]) [0 *src_3(D)+0 S1 A8]))
"/app/example.cpp":4:39 93 {*movqi_internal}
 (nil))
(insn 8 7 9 2 (set (mem:QI (reg:DI 4 si [105]) [0 *dst_5(D)+0 S1 A8])
(reg:QI 0 ax [orig:98 _1 ] [98])) "/app/example.cpp":5:10 93
{*movqi_internal}
 (nil))
(insn 9 8 15 2 (set (reg:SI 0 ax [orig:103 _1 ] [103])
(sign_extend:SI (reg:QI 0 ax [orig:98 _1 ] [98])))
"/app/example.cpp":6:12 183 {extendqisi2}
 (nil))

So REE is able to move that sign extend back to the original load.