http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47764
Summary: The constant load instruction should be hoisted out of loop Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: car...@google.com Target: arm-linux-androideabi Created attachment 23359 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=23359 testcase The attached test case is extracted from zlib. Compile it with options -march=armv7-a -mthumb -Os, gcc 4.6 generates: init_block: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. movs r3, #0 .L2: adds r2, r0, r3 adds r3, r3, #4 movs r1, #0 // A cmp r3, #1144 strh r1, [r2, #60] @ movhi // B bne .L2 movs r3, #0 .L3: adds r2, r0, r3 adds r3, r3, #4 movs r1, #0 // C cmp r3, #120 strh r1, [r2, #2352] @ movhi bne .L3 movs r2, #0 .L4: adds r1, r0, r2 adds r2, r2, #4 movs r3, #0 // D cmp r2, #76 strh r3, [r1, #2596] @ movhi bne .L4 movs r2, #1 str r3, [r0, #2760] strh r2, [r0, #1084] @ movhi str r3, [r0, #2756] str r3, [r0, #2764] str r3, [r0, #2752] bx lr Note that instruction A in loop L2 loads constant 0 to register r1, then instruction B stores r1 into memory. There is no other usage of r1 in the loop. So it's better to move instruction A out of the loop. Similarly instruction C can be moved out of loop L3. Actually it can be removed since after instruction A the register r1 already contains 0 and no instruction modify it later. Similarly instruction D cam be moved out of loop L4. It can also be removed if we exchange the register usage of r1 and r3 in loop L4.