Hi, As reported in pr57540, gcc chooses bad address mode, resulting in A) invariant part of address expression is not kept or hoisted; b) additional computation which should be encoded in address expression. The reason is when gcc runs into "addr+offset" (which is invalid) during expanding, it pre-computes the entire address and accesses memory unit using "MEM[reg]". Yet we can force addr into register and try to generate "reg+offset" which is valid for targets like ARM. By doing this, we can: 1) keep addr in loop invariant form and hoist it later; 2) saving additional computation by taking advantage of scaled addressing mode;
This issue has substantial impact for ARM mode, and also benefits Thumb2 although not so much as ARM mode. For example from the bug entry, assembly code like: blt .L3 .L5: add lr, sp, #2064 ////loop invariant add r2, r2, #1 add r3, lr, r3, asl #2 ldr r3, [r3, #-2064] cmp r3, #0 bge .L5 uxtb r2, r2 can be optimized into: blt .L3 .L5: add r2, r2, #1 ldr r3, [sp, r3, asl #2] cmp r3, #0 bge .L5 uxtb r2, r2 Bootstrap and test on x86/arm, any comments? Thanks. bin 2013-06-13 Bin Cheng <bin.ch...@arm.com> PR target/57540 * emit-rtl.c (offset_address): Try to force ADDR into register and generate reg+offset if addr+offset is invalid.
Index: gcc/emit-rtl.c =================================================================== --- gcc/emit-rtl.c (revision 199949) +++ gcc/emit-rtl.c (working copy) @@ -2175,15 +2175,20 @@ offset_address (rtx memref, rtx offset, unsigned H /* At this point we don't know _why_ the address is invalid. It could have secondary memory references, multiplies or anything. + Yet we can try to force addr into register, in order to catch + the scaled addressing opportunity as "reg + scaled_offset". - However, if we did go and rearrange things, we can wind up not + Otherwise, if we did go and rearrange things, we can wind up not being able to recognize the magic around pic_offset_table_rtx. This stuff is fragile, and is yet another example of why it is - bad to expose PIC machinery too early. */ + bad to expose PIC machinery too early. We may also wind up not + being able to recognize the scaled addressing pattern. + + It won't hurt because the address here is invalid and we are + going to pre-compute it anyway. */ if (! memory_address_addr_space_p (GET_MODE (memref), new_rtx, attrs.addrspace) - && GET_CODE (addr) == PLUS - && XEXP (addr, 0) == pic_offset_table_rtx) + && GET_CODE (addr) == PLUS) { addr = force_reg (GET_MODE (addr), addr); new_rtx = simplify_gen_binary (PLUS, address_mode, addr, offset);