Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference
On Wed, 2013-06-19 at 22:42 +0800, Bin.Cheng wrote: > On Tue, Jun 18, 2013 at 10:02 PM, Oleg Endo wrote: > > > > No, I haven't disabled ivopt. > > > > But -fno-ivopts is specified in PR50749. > With current implementation, auto-inc-dec iterates instructions > backward, tries to find memory access and increment/decrement pairs. > It will miss opportunities if instructions are interfered with each > other. Sorry for the confusion. I used -fno.ivopts in the original examples in the PR to emphasize one of the auto-inc-dec problems. If ivopts is not disabled there will be no auto-inc addressing modes at all. I've added a comment in the PR clarifying this. Cheers, Oleg
Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference
On Tue, Jun 18, 2013 at 10:02 PM, Oleg Endo wrote: > On Tue, 2013-06-18 at 18:09 +0800, Bin.Cheng wrote: >> On Tue, Jun 18, 2013 at 3:52 AM, Oleg Endo wrote: >> > >> > My observation is, that legitimizing addressing modes in the backend by >> > looking at one isolated address works, but doesn't give good results. >> > In the SH backend there is this a particular case with displacement >> > addressing (register + constant). On SH displacements for byte >> > addressing are 0..15, 0..31 for 16 bit words and 0..63 for 32 bit words. >> > sh_legitimize_address (or rather sh_find_mov_disp_adjust) uses a fixed >> > heuristic to satisfy the displacement constraint and splits out a plus >> > insn if needed to adjust the base address. Of course that fixed >> > heuristic doesn't work for some cases and thus sometimes results in >> > unnecessary base address adjustments. >> > I had the idea of replacing the current (partially defunct) auto-inc-dec >> > RTL pass with a more generic addressing mode selection pass: >> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56590 >> > >> > Any suggestions/comments/... are highly appreciated. >> > >> In PR56590, is PR50749 the only one that correlate with auto-inc-dec? >> Others seem just problems of wrong addressing mode. > > Yes, PR 50749 was the initial description of auto-inc-dec defects. PR > 52049 is also related to it, as the code examples there are candidates > for post-inc addressing mode. In that case, if 'int' is replaced with > 'float' on SH post-inc is the optimal mode, because it doesn't have > displacement addressing for FPU loads, except than SH2A. Even then, > using post-inc is better as it has a more compact instruction encoding. > The current auto-inc-dec is not able to discover such opportunities if, > for example, mem accesses are reordered by preceding optimization > passes. > >> And one point on PR50749, auto-inc-dec depends on ivopt to choose >> auto-increment candidate. Since you disabled ivopt, I bet GCC will >> miss lots of auto-increment opportunities. > > No, I haven't disabled ivopt. > But -fno-ivopts is specified in PR50749. With current implementation, auto-inc-dec iterates instructions backward, tries to find memory access and increment/decrement pairs. It will miss opportunities if instructions are interfered with each other. Thanks. bin -- Best Regards.
Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference
On Tue, 2013-06-18 at 18:09 +0800, Bin.Cheng wrote: > On Tue, Jun 18, 2013 at 3:52 AM, Oleg Endo wrote: > > > > My observation is, that legitimizing addressing modes in the backend by > > looking at one isolated address works, but doesn't give good results. > > In the SH backend there is this a particular case with displacement > > addressing (register + constant). On SH displacements for byte > > addressing are 0..15, 0..31 for 16 bit words and 0..63 for 32 bit words. > > sh_legitimize_address (or rather sh_find_mov_disp_adjust) uses a fixed > > heuristic to satisfy the displacement constraint and splits out a plus > > insn if needed to adjust the base address. Of course that fixed > > heuristic doesn't work for some cases and thus sometimes results in > > unnecessary base address adjustments. > > I had the idea of replacing the current (partially defunct) auto-inc-dec > > RTL pass with a more generic addressing mode selection pass: > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56590 > > > > Any suggestions/comments/... are highly appreciated. > > > In PR56590, is PR50749 the only one that correlate with auto-inc-dec? > Others seem just problems of wrong addressing mode. Yes, PR 50749 was the initial description of auto-inc-dec defects. PR 52049 is also related to it, as the code examples there are candidates for post-inc addressing mode. In that case, if 'int' is replaced with 'float' on SH post-inc is the optimal mode, because it doesn't have displacement addressing for FPU loads, except than SH2A. Even then, using post-inc is better as it has a more compact instruction encoding. The current auto-inc-dec is not able to discover such opportunities if, for example, mem accesses are reordered by preceding optimization passes. > And one point on PR50749, auto-inc-dec depends on ivopt to choose > auto-increment candidate. Since you disabled ivopt, I bet GCC will > miss lots of auto-increment opportunities. No, I haven't disabled ivopt. Cheers, Oleg
Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference
On Tue, Jun 18, 2013 at 3:52 AM, Oleg Endo wrote: > On Mon, 2013-06-17 at 10:41 +0200, Eric Botcazou wrote: >> > My mistake. It's because arm_legitimize_address cannot re-factor "[r105 + >> > r165*4 + (-2064)]" into "rx = r105 + (-2064); [rx + r165*4]". Do you >> > suggest that this kind of transformation should be done in backend? I can >> > think of some disadvantages by doing it in backend: >> > Different kinds of address expression might be wanted in different phase of >> > compilation, for example, sometimes GCC wants to force computation of >> > address expression out of memory access because it cannot CSE array >> > indexing. It's difficult to differentiate in backend. >> >> It's equally difficult in memory_address_addr_space and it affects all the >> architectures at once... You said in the original message that Thumb2 and >> ARM >> modes are already not equally sensitive to it, so it's not unthinkable that >> some architectures might not want it at all. Therefore, yes, this should >> preferably be handled in arm_legitimize_address. > > My observation is, that legitimizing addressing modes in the backend by > looking at one isolated address works, but doesn't give good results. > In the SH backend there is this a particular case with displacement > addressing (register + constant). On SH displacements for byte > addressing are 0..15, 0..31 for 16 bit words and 0..63 for 32 bit words. > sh_legitimize_address (or rather sh_find_mov_disp_adjust) uses a fixed > heuristic to satisfy the displacement constraint and splits out a plus > insn if needed to adjust the base address. Of course that fixed > heuristic doesn't work for some cases and thus sometimes results in > unnecessary base address adjustments. > I had the idea of replacing the current (partially defunct) auto-inc-dec > RTL pass with a more generic addressing mode selection pass: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56590 > > Any suggestions/comments/... are highly appreciated. > In PR56590, is PR50749 the only one that correlate with auto-inc-dec? Others seem just problems of wrong addressing mode. And one point on PR50749, auto-inc-dec depends on ivopt to choose auto-increment candidate. Since you disabled ivopt, I bet GCC will miss lots of auto-increment opportunities. Thanks. bin -- Best Regards.
Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference
On Mon, 2013-06-17 at 10:41 +0200, Eric Botcazou wrote: > > My mistake. It's because arm_legitimize_address cannot re-factor "[r105 + > > r165*4 + (-2064)]" into "rx = r105 + (-2064); [rx + r165*4]". Do you > > suggest that this kind of transformation should be done in backend? I can > > think of some disadvantages by doing it in backend: > > Different kinds of address expression might be wanted in different phase of > > compilation, for example, sometimes GCC wants to force computation of > > address expression out of memory access because it cannot CSE array > > indexing. It's difficult to differentiate in backend. > > It's equally difficult in memory_address_addr_space and it affects all the > architectures at once... You said in the original message that Thumb2 and > ARM > modes are already not equally sensitive to it, so it's not unthinkable that > some architectures might not want it at all. Therefore, yes, this should > preferably be handled in arm_legitimize_address. My observation is, that legitimizing addressing modes in the backend by looking at one isolated address works, but doesn't give good results. In the SH backend there is this a particular case with displacement addressing (register + constant). On SH displacements for byte addressing are 0..15, 0..31 for 16 bit words and 0..63 for 32 bit words. sh_legitimize_address (or rather sh_find_mov_disp_adjust) uses a fixed heuristic to satisfy the displacement constraint and splits out a plus insn if needed to adjust the base address. Of course that fixed heuristic doesn't work for some cases and thus sometimes results in unnecessary base address adjustments. I had the idea of replacing the current (partially defunct) auto-inc-dec RTL pass with a more generic addressing mode selection pass: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56590 Any suggestions/comments/... are highly appreciated. Cheers, Oleg
RE: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference
> -Original Message- > From: Eric Botcazou [mailto:ebotca...@adacore.com] > Sent: Monday, June 17, 2013 4:42 PM > To: Bin Cheng > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode > when expanding array reference > > > My mistake. It's because arm_legitimize_address cannot re-factor > > "[r105 + > > r165*4 + (-2064)]" into "rx = r105 + (-2064); [rx + r165*4]". Do you > > suggest that this kind of transformation should be done in backend? I > > can think of some disadvantages by doing it in backend: > > Different kinds of address expression might be wanted in different > > phase of compilation, for example, sometimes GCC wants to force > > computation of address expression out of memory access because it > > cannot CSE array indexing. It's difficult to differentiate in backend. > > It's equally difficult in memory_address_addr_space and it affects all the > architectures at once... You said in the original message that Thumb2 and ARM > modes are already not equally sensitive to it, so it's not unthinkable that > some architectures might not want it at all. Therefore, yes, this should > preferably be handled in arm_legitimize_address. I was thinking it would be more accurate to capture the scaled_offset without over-kill by doing it in offset_address. Only problem I can image is the change forces addr into register, which might nullifies backend dependent transformation. I will try to do it in arm_legitimize_address and see what's the result. Thanks. bin
Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference
> My mistake. It's because arm_legitimize_address cannot re-factor "[r105 + > r165*4 + (-2064)]" into "rx = r105 + (-2064); [rx + r165*4]". Do you > suggest that this kind of transformation should be done in backend? I can > think of some disadvantages by doing it in backend: > Different kinds of address expression might be wanted in different phase of > compilation, for example, sometimes GCC wants to force computation of > address expression out of memory access because it cannot CSE array > indexing. It's difficult to differentiate in backend. It's equally difficult in memory_address_addr_space and it affects all the architectures at once... You said in the original message that Thumb2 and ARM modes are already not equally sensitive to it, so it's not unthinkable that some architectures might not want it at all. Therefore, yes, this should preferably be handled in arm_legitimize_address. -- Eric Botcazou
RE: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference
> -Original Message- > From: Eric Botcazou [mailto:ebotca...@adacore.com] > Sent: Monday, June 17, 2013 3:32 PM > To: Bin Cheng > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode > when expanding array reference > > > The problem occurs when accessing local array element. For example, # > > VUSE <.MEM_27> > > k_8 = parent[k_29]; > > > > GCC calculates the address in three steps: > > 1) addr is calculated as "r105 + (-2064)". > > 2) offset is calculated as "r165*4". > > 3) calls offset_address to combine the address into "r105+ r165*4 + > > (-2064)". > > > > Since ADDR is valid and there is no call to memory_address_addr_space > > in offset_address, the invalid address expression has no chance to go > > through target dependent legitimization function. > > But offset_address calls change_address_1 with validate set to 1 so during RTL > expansion memory_address_addr_space should be invoked on the invalid address. Yes, I missed that part. > > > Even there is a chance, the > > current implementation of memory_address_addr_space prevents the > > scaled address expression from being generated because of below code: > > if (! cse_not_expected && !REG_P (x)) > > x = break_out_memory_refs (x); > > Where are the memory references in the above address? My mistake. It's because arm_legitimize_address cannot re-factor "[r105 + r165*4 + (-2064)]" into "rx = r105 + (-2064); [rx + r165*4]". Do you suggest that this kind of transformation should be done in backend? I can think of some disadvantages by doing it in backend: Different kinds of address expression might be wanted in different phase of compilation, for example, sometimes GCC wants to force computation of address expression out of memory access because it cannot CSE array indexing. It's difficult to differentiate in backend. Thanks. bin
Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference
> The problem occurs when accessing local array element. For example, > # VUSE <.MEM_27> > k_8 = parent[k_29]; > > GCC calculates the address in three steps: > 1) addr is calculated as "r105 + (-2064)". > 2) offset is calculated as "r165*4". > 3) calls offset_address to combine the address into "r105+ r165*4 + > (-2064)". > > Since ADDR is valid and there is no call to memory_address_addr_space in > offset_address, the invalid address expression has no chance to go through > target dependent legitimization function. But offset_address calls change_address_1 with validate set to 1 so during RTL expansion memory_address_addr_space should be invoked on the invalid address. > Even there is a chance, the > current implementation of memory_address_addr_space prevents the scaled > address expression from being generated because of below code: > if (! cse_not_expected && !REG_P (x)) > x = break_out_memory_refs (x); Where are the memory references in the above address? -- Eric Botcazou
RE: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference
> -Original Message- > From: Eric Botcazou [mailto:ebotca...@adacore.com] > Sent: Saturday, June 15, 2013 5:37 PM > To: Bin Cheng > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode > when expanding array reference > > > As reported in pr57540, gcc chooses bad address mode, resulting in A) > > invariant part of address expression is not kept or hoisted; b) > > additional computation which should be encoded in address expression. > > The reason is when gcc runs into "addr+offset" (which is invalid) > > during expanding, it pre-computes the entire address and accesses memory > unit using "MEM[reg]". > > Yet we can force addr into register and try to generate "reg+offset" > > which is valid for targets like ARM. By doing this, we can: > > 1) keep addr in loop invariant form and hoist it later; > > 2) saving additional computation by taking advantage of scaled > > addressing mode; > > Does the invalid address not go through arm_legitimize_address from here? > > /* Perform machine-dependent transformations on X >in certain cases. This is not necessary since the code >below can handle all possible cases, but machine-dependent >transformations can make better code. */ > { > rtx orig_x = x; > x = targetm.addr_space.legitimize_address (x, oldx, mode, as); > if (orig_x != x && memory_address_addr_space_p (mode, x, as)) > goto done; > } > Hi Eric, The problem occurs when accessing local array element. For example, # VUSE <.MEM_27> k_8 = parent[k_29]; GCC calculates the address in three steps: 1) addr is calculated as "r105 + (-2064)". 2) offset is calculated as "r165*4". 3) calls offset_address to combine the address into "r105+ r165*4 + (-2064)". Since ADDR is valid and there is no call to memory_address_addr_space in offset_address, the invalid address expression has no chance to go through target dependent legitimization function. Even there is a chance, the current implementation of memory_address_addr_space prevents the scaled address expression from being generated because of below code: if (! cse_not_expected && !REG_P (x)) x = break_out_memory_refs (x); Thanks. bin
Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference
> As reported in pr57540, gcc chooses bad address mode, resulting in A) > invariant part of address expression is not kept or hoisted; b) additional > computation which should be encoded in address expression. The reason is > when gcc runs into "addr+offset" (which is invalid) during expanding, it > pre-computes the entire address and accesses memory unit using "MEM[reg]". > Yet we can force addr into register and try to generate "reg+offset" which > is valid for targets like ARM. By doing this, we can: > 1) keep addr in loop invariant form and hoist it later; > 2) saving additional computation by taking advantage of scaled addressing > mode; Does the invalid address not go through arm_legitimize_address from here? /* Perform machine-dependent transformations on X in certain cases. This is not necessary since the code below can handle all possible cases, but machine-dependent transformations can make better code. */ { rtx orig_x = x; x = targetm.addr_space.legitimize_address (x, oldx, mode, as); if (orig_x != x && memory_address_addr_space_p (mode, x, as)) goto done; } -- Eric Botcazou
[PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference
Hi, As reported in pr57540, gcc chooses bad address mode, resulting in A) invariant part of address expression is not kept or hoisted; b) additional computation which should be encoded in address expression. The reason is when gcc runs into "addr+offset" (which is invalid) during expanding, it pre-computes the entire address and accesses memory unit using "MEM[reg]". Yet we can force addr into register and try to generate "reg+offset" which is valid for targets like ARM. By doing this, we can: 1) keep addr in loop invariant form and hoist it later; 2) saving additional computation by taking advantage of scaled addressing mode; This issue has substantial impact for ARM mode, and also benefits Thumb2 although not so much as ARM mode. For example from the bug entry, assembly code like: blt .L3 .L5: add lr, sp, #2064loop invariant add r2, r2, #1 add r3, lr, r3, asl #2 ldr r3, [r3, #-2064] cmp r3, #0 bge .L5 uxtbr2, r2 can be optimized into: blt .L3 .L5: add r2, r2, #1 ldr r3, [sp, r3, asl #2] cmp r3, #0 bge .L5 uxtbr2, r2 Bootstrap and test on x86/arm, any comments? Thanks. bin 2013-06-13 Bin Cheng PR target/57540 * emit-rtl.c (offset_address): Try to force ADDR into register and generate reg+offset if addr+offset is invalid. Index: gcc/emit-rtl.c === --- gcc/emit-rtl.c (revision 199949) +++ gcc/emit-rtl.c (working copy) @@ -2175,15 +2175,20 @@ offset_address (rtx memref, rtx offset, unsigned H /* At this point we don't know _why_ the address is invalid. It could have secondary memory references, multiplies or anything. + Yet we can try to force addr into register, in order to catch + the scaled addressing opportunity as "reg + scaled_offset". - However, if we did go and rearrange things, we can wind up not + Otherwise, if we did go and rearrange things, we can wind up not being able to recognize the magic around pic_offset_table_rtx. This stuff is fragile, and is yet another example of why it is - bad to expose PIC machinery too early. */ + bad to expose PIC machinery too early. We may also wind up not + being able to recognize the scaled addressing pattern. + + It won't hurt because the address here is invalid and we are + going to pre-compute it anyway. */ if (! memory_address_addr_space_p (GET_MODE (memref), new_rtx, attrs.addrspace) - && GET_CODE (addr) == PLUS - && XEXP (addr, 0) == pic_offset_table_rtx) + && GET_CODE (addr) == PLUS) { addr = force_reg (GET_MODE (addr), addr); new_rtx = simplify_gen_binary (PLUS, address_mode, addr, offset);