https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122470

--- Comment #4 from Jeffrey A. Law <law at gcc dot gnu.org> ---
So what I find potentially ore interesting here is the failure of the RTL
optimizers to simplify that store-load sequence.   THe problem is most likely
the sizes of the access:

(insn 7 4 10 2 (set (mem/j:QI (reg/v/f:DI 135 [ out ]) [1 out_4(D)->f_1+0 S1
A32])
        (const_int 0 [0])) "j.c":10:46 282 {*movqi_internal}
     (nil))
[ ... ]
(insn 12 11 13 2 (set (reg:DI 142)
        (sign_extend:DI (mem/j:SI (reg/v/f:DI 135 [ out ]) [1 out_4(D)->f_2+-1
S4 A32]))) "j.c":10:46 125 {*extendsidi2_internal}
     (nil))
(insn 13 12 14 2 (set (reg:SI 141)
        (subreg:SI (reg:DI 142) 0)) "j.c":10:46 276 {*movsi_internal}
     (nil))
(insn 14 13 15 2 (set (reg:DI 143)
        (and:DI (subreg:DI (reg:SI 141) 0)
            (const_int 255 [0xff]))) "j.c":10:46 104 {*anddi3}
     (nil))


It's not until after CSE2 that things clean up meaningfully.  But nothing after
CSE2 is likely to clean this up.  Not sure the best path forward, but it's
clearly not a trivial problem.

Reply via email to