http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47764
--- Comment #3 from Carrot <carrot at google dot com> 2011-02-21 03:15:45 UTC --- > Any ideas of how this improvement could be implemented, Carrot? The root cause of this problem is that arm/thumb store instruction can't directly store a immediate number to memory, but gcc doesn't realize this early enough. In most part of the rtl phase, the following form is kept. (insn 41 38 42 3 (set (mem:HI (plus:SI (reg/f:SI 169) (const_int 60 [0x3c])) [2 MEM[(struct deflate_state *)D.2085 _3 + 60B]+0 S2 A16]) (const_int 0 [0])) src/trees.c:45 696 {*thumb2_movhi_insn} (expr_list:REG_DEAD (reg/f:SI 169) (nil))) Until register allocation it finds the restriction of the store instruction and split it into two instructions, load 0 into register and store register to memory. But it's too late to do a loop optimization. One possible method is to split this insn earlier than loop optimization (maybe directly in expand pass), and let loop and cse optimizations do the rest. It may increase register pressure in part of the program, we should rematerialize it in such cases.