Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference

2013-06-22 Thread Oleg Endo
On Wed, 2013-06-19 at 22:42 +0800, Bin.Cheng wrote:
> On Tue, Jun 18, 2013 at 10:02 PM, Oleg Endo  wrote:
> >
> > No, I haven't disabled ivopt.
> >
> 
> But -fno-ivopts is specified in PR50749.
> With current implementation, auto-inc-dec iterates instructions
> backward, tries to find memory access and increment/decrement pairs.
> It will miss opportunities if instructions are interfered with each
> other.

Sorry for the confusion.  I used -fno.ivopts in the original examples in
the PR to emphasize one of the auto-inc-dec problems.  If ivopts is not
disabled there will be no auto-inc addressing modes at all.
I've added a comment in the PR clarifying this.

Cheers,
Oleg



Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference

2013-06-19 Thread Bin.Cheng
On Tue, Jun 18, 2013 at 10:02 PM, Oleg Endo  wrote:
> On Tue, 2013-06-18 at 18:09 +0800, Bin.Cheng wrote:
>> On Tue, Jun 18, 2013 at 3:52 AM, Oleg Endo  wrote:
>> >
>> > My observation is, that legitimizing addressing modes in the backend by
>> > looking at one isolated address works, but doesn't give good results.
>> > In the SH backend there is this a particular case with displacement
>> > addressing (register + constant).  On SH displacements for byte
>> > addressing are 0..15, 0..31 for 16 bit words and 0..63 for 32 bit words.
>> > sh_legitimize_address (or rather sh_find_mov_disp_adjust) uses a fixed
>> > heuristic to satisfy the displacement constraint and splits out a plus
>> > insn if needed to adjust the base address.  Of course that fixed
>> > heuristic doesn't work for some cases and thus sometimes results in
>> > unnecessary base address adjustments.
>> > I had the idea of replacing the current (partially defunct) auto-inc-dec
>> > RTL pass with a more generic addressing mode selection pass:
>> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56590
>> >
>> > Any suggestions/comments/... are highly appreciated.
>> >
>> In PR56590, is PR50749 the only one that correlate with auto-inc-dec?
>> Others seem just problems of wrong addressing mode.
>
> Yes, PR 50749 was the initial description of auto-inc-dec defects.  PR
> 52049 is also related to it, as the code examples there are candidates
> for post-inc addressing mode.  In that case, if 'int' is replaced with
> 'float' on SH post-inc is the optimal mode, because it doesn't have
> displacement addressing for FPU loads, except than SH2A.  Even then,
> using post-inc is better as it has a more compact instruction encoding.
> The current auto-inc-dec is not able to discover such opportunities if,
> for example, mem accesses are reordered by preceding optimization
> passes.
>
>> And one point on PR50749, auto-inc-dec depends on ivopt to choose
>> auto-increment candidate.  Since you disabled ivopt, I bet GCC will
>> miss lots of auto-increment opportunities.
>
> No, I haven't disabled ivopt.
>

But -fno-ivopts is specified in PR50749.
With current implementation, auto-inc-dec iterates instructions
backward, tries to find memory access and increment/decrement pairs.
It will miss opportunities if instructions are interfered with each
other.

Thanks.
bin
--
Best Regards.


Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference

2013-06-18 Thread Oleg Endo
On Tue, 2013-06-18 at 18:09 +0800, Bin.Cheng wrote:
> On Tue, Jun 18, 2013 at 3:52 AM, Oleg Endo  wrote:
> >
> > My observation is, that legitimizing addressing modes in the backend by
> > looking at one isolated address works, but doesn't give good results.
> > In the SH backend there is this a particular case with displacement
> > addressing (register + constant).  On SH displacements for byte
> > addressing are 0..15, 0..31 for 16 bit words and 0..63 for 32 bit words.
> > sh_legitimize_address (or rather sh_find_mov_disp_adjust) uses a fixed
> > heuristic to satisfy the displacement constraint and splits out a plus
> > insn if needed to adjust the base address.  Of course that fixed
> > heuristic doesn't work for some cases and thus sometimes results in
> > unnecessary base address adjustments.
> > I had the idea of replacing the current (partially defunct) auto-inc-dec
> > RTL pass with a more generic addressing mode selection pass:
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56590
> >
> > Any suggestions/comments/... are highly appreciated.
> >
> In PR56590, is PR50749 the only one that correlate with auto-inc-dec?
> Others seem just problems of wrong addressing mode.

Yes, PR 50749 was the initial description of auto-inc-dec defects.  PR
52049 is also related to it, as the code examples there are candidates
for post-inc addressing mode.  In that case, if 'int' is replaced with
'float' on SH post-inc is the optimal mode, because it doesn't have
displacement addressing for FPU loads, except than SH2A.  Even then,
using post-inc is better as it has a more compact instruction encoding.
The current auto-inc-dec is not able to discover such opportunities if,
for example, mem accesses are reordered by preceding optimization
passes.

> And one point on PR50749, auto-inc-dec depends on ivopt to choose
> auto-increment candidate.  Since you disabled ivopt, I bet GCC will
> miss lots of auto-increment opportunities.

No, I haven't disabled ivopt.

Cheers,
Oleg



Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference

2013-06-18 Thread Bin.Cheng
On Tue, Jun 18, 2013 at 3:52 AM, Oleg Endo  wrote:
> On Mon, 2013-06-17 at 10:41 +0200, Eric Botcazou wrote:
>> > My mistake. It's because arm_legitimize_address cannot re-factor "[r105 +
>> > r165*4 + (-2064)]" into "rx = r105 + (-2064); [rx + r165*4]".  Do you
>> > suggest that this kind of transformation should be done in backend?  I can
>> > think of some disadvantages by doing it in backend:
>> > Different kinds of address expression might be wanted in different phase of
>> > compilation, for example, sometimes GCC wants to force computation of
>> > address expression out of memory access because it cannot CSE array
>> > indexing.  It's difficult to differentiate in backend.
>>
>> It's equally difficult in memory_address_addr_space and it affects all the
>> architectures at once...  You said in the original message that Thumb2 and 
>> ARM
>> modes are already not equally sensitive to it, so it's not unthinkable that
>> some architectures might not want it at all.  Therefore, yes, this should
>> preferably be handled in arm_legitimize_address.
>
> My observation is, that legitimizing addressing modes in the backend by
> looking at one isolated address works, but doesn't give good results.
> In the SH backend there is this a particular case with displacement
> addressing (register + constant).  On SH displacements for byte
> addressing are 0..15, 0..31 for 16 bit words and 0..63 for 32 bit words.
> sh_legitimize_address (or rather sh_find_mov_disp_adjust) uses a fixed
> heuristic to satisfy the displacement constraint and splits out a plus
> insn if needed to adjust the base address.  Of course that fixed
> heuristic doesn't work for some cases and thus sometimes results in
> unnecessary base address adjustments.
> I had the idea of replacing the current (partially defunct) auto-inc-dec
> RTL pass with a more generic addressing mode selection pass:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56590
>
> Any suggestions/comments/... are highly appreciated.
>
In PR56590, is PR50749 the only one that correlate with auto-inc-dec?
Others seem just problems of wrong addressing mode.

And one point on PR50749, auto-inc-dec depends on ivopt to choose
auto-increment candidate.  Since you disabled ivopt, I bet GCC will
miss lots of auto-increment opportunities.

Thanks.
bin
--
Best Regards.


Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference

2013-06-17 Thread Oleg Endo
On Mon, 2013-06-17 at 10:41 +0200, Eric Botcazou wrote:
> > My mistake. It's because arm_legitimize_address cannot re-factor "[r105 +
> > r165*4 + (-2064)]" into "rx = r105 + (-2064); [rx + r165*4]".  Do you
> > suggest that this kind of transformation should be done in backend?  I can
> > think of some disadvantages by doing it in backend:
> > Different kinds of address expression might be wanted in different phase of
> > compilation, for example, sometimes GCC wants to force computation of
> > address expression out of memory access because it cannot CSE array
> > indexing.  It's difficult to differentiate in backend.
> 
> It's equally difficult in memory_address_addr_space and it affects all the 
> architectures at once...  You said in the original message that Thumb2 and 
> ARM 
> modes are already not equally sensitive to it, so it's not unthinkable that 
> some architectures might not want it at all.  Therefore, yes, this should 
> preferably be handled in arm_legitimize_address.

My observation is, that legitimizing addressing modes in the backend by
looking at one isolated address works, but doesn't give good results.
In the SH backend there is this a particular case with displacement
addressing (register + constant).  On SH displacements for byte
addressing are 0..15, 0..31 for 16 bit words and 0..63 for 32 bit words.
sh_legitimize_address (or rather sh_find_mov_disp_adjust) uses a fixed
heuristic to satisfy the displacement constraint and splits out a plus
insn if needed to adjust the base address.  Of course that fixed
heuristic doesn't work for some cases and thus sometimes results in
unnecessary base address adjustments.
I had the idea of replacing the current (partially defunct) auto-inc-dec
RTL pass with a more generic addressing mode selection pass:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56590

Any suggestions/comments/... are highly appreciated.

Cheers,
Oleg



RE: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference

2013-06-17 Thread Bin Cheng


> -Original Message-
> From: Eric Botcazou [mailto:ebotca...@adacore.com]
> Sent: Monday, June 17, 2013 4:42 PM
> To: Bin Cheng
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address
mode
> when expanding array reference
> 
> > My mistake. It's because arm_legitimize_address cannot re-factor
> > "[r105 +
> > r165*4 + (-2064)]" into "rx = r105 + (-2064); [rx + r165*4]".  Do you
> > suggest that this kind of transformation should be done in backend?  I
> > can think of some disadvantages by doing it in backend:
> > Different kinds of address expression might be wanted in different
> > phase of compilation, for example, sometimes GCC wants to force
> > computation of address expression out of memory access because it
> > cannot CSE array indexing.  It's difficult to differentiate in backend.
> 
> It's equally difficult in memory_address_addr_space and it affects all the
> architectures at once...  You said in the original message that Thumb2 and
ARM
> modes are already not equally sensitive to it, so it's not unthinkable
that
> some architectures might not want it at all.  Therefore, yes, this should
> preferably be handled in arm_legitimize_address.

I was thinking it would be more accurate to capture the scaled_offset
without over-kill by doing it in offset_address.  Only problem I can image
is the change forces addr into register, which might nullifies backend
dependent transformation.
I will try to do it in arm_legitimize_address and see what's the result.

Thanks.
bin





Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference

2013-06-17 Thread Eric Botcazou
> My mistake. It's because arm_legitimize_address cannot re-factor "[r105 +
> r165*4 + (-2064)]" into "rx = r105 + (-2064); [rx + r165*4]".  Do you
> suggest that this kind of transformation should be done in backend?  I can
> think of some disadvantages by doing it in backend:
> Different kinds of address expression might be wanted in different phase of
> compilation, for example, sometimes GCC wants to force computation of
> address expression out of memory access because it cannot CSE array
> indexing.  It's difficult to differentiate in backend.

It's equally difficult in memory_address_addr_space and it affects all the 
architectures at once...  You said in the original message that Thumb2 and ARM 
modes are already not equally sensitive to it, so it's not unthinkable that 
some architectures might not want it at all.  Therefore, yes, this should 
preferably be handled in arm_legitimize_address.

-- 
Eric Botcazou


RE: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference

2013-06-17 Thread Bin Cheng


> -Original Message-
> From: Eric Botcazou [mailto:ebotca...@adacore.com]
> Sent: Monday, June 17, 2013 3:32 PM
> To: Bin Cheng
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address
mode
> when expanding array reference
> 
> > The problem occurs when accessing local array element. For example, #
> > VUSE <.MEM_27>
> > k_8 = parent[k_29];
> >
> > GCC calculates the address in three steps:
> > 1) addr is calculated as "r105 + (-2064)".
> > 2) offset is calculated as "r165*4".
> > 3) calls offset_address to combine the address into "r105+ r165*4 +
> > (-2064)".
> >
> > Since ADDR is valid and there is no call to memory_address_addr_space
> > in offset_address, the invalid address expression has no chance to go
> > through target dependent legitimization function.
> 
> But offset_address calls change_address_1 with validate set to 1 so during
RTL
> expansion memory_address_addr_space should be invoked on the invalid
address.
Yes, I missed that part.

> 
> > Even there is a chance, the
> > current implementation of memory_address_addr_space prevents the
> > scaled address expression from being generated because of below code:
> >   if (! cse_not_expected && !REG_P (x))
> >   x = break_out_memory_refs (x);
> 
> Where are the memory references in the above address?
My mistake. It's because arm_legitimize_address cannot re-factor "[r105 +
r165*4 + (-2064)]" into "rx = r105 + (-2064); [rx + r165*4]".  Do you
suggest that this kind of transformation should be done in backend?  I can
think of some disadvantages by doing it in backend:
Different kinds of address expression might be wanted in different phase of
compilation, for example, sometimes GCC wants to force computation of
address expression out of memory access because it cannot CSE array
indexing.  It's difficult to differentiate in backend.

Thanks.
bin





Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference

2013-06-17 Thread Eric Botcazou
> The problem occurs when accessing local array element. For example,
> # VUSE <.MEM_27>
> k_8 = parent[k_29];
> 
> GCC calculates the address in three steps:
> 1) addr is calculated as "r105 + (-2064)".
> 2) offset is calculated as "r165*4".
> 3) calls offset_address to combine the address into "r105+ r165*4 +
> (-2064)".
> 
> Since ADDR is valid and there is no call to memory_address_addr_space in
> offset_address, the invalid address expression has no chance to go through
> target dependent legitimization function.

But offset_address calls change_address_1 with validate set to 1 so during RTL 
expansion memory_address_addr_space should be invoked on the invalid address.

> Even there is a chance, the
> current implementation of memory_address_addr_space prevents the scaled
> address expression from being generated because of below code:
>   if (! cse_not_expected && !REG_P (x))
> x = break_out_memory_refs (x);

Where are the memory references in the above address?

-- 
Eric Botcazou


RE: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference

2013-06-16 Thread Bin Cheng


> -Original Message-
> From: Eric Botcazou [mailto:ebotca...@adacore.com]
> Sent: Saturday, June 15, 2013 5:37 PM
> To: Bin Cheng
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address
mode
> when expanding array reference
> 
> > As reported in pr57540, gcc chooses bad address mode, resulting in A)
> > invariant part of address expression is not kept or hoisted; b)
> > additional computation which should be encoded in address expression.
> > The reason is when gcc runs into "addr+offset" (which is invalid)
> > during expanding, it pre-computes the entire address and accesses memory
> unit using "MEM[reg]".
> > Yet we can force addr into register and try to generate "reg+offset"
> > which is valid for targets like ARM.  By doing this, we can:
> > 1) keep addr in loop invariant form and hoist it later;
> > 2) saving additional computation by taking advantage of scaled
> > addressing mode;
> 
> Does the invalid address not go through arm_legitimize_address from here?
> 
>   /* Perform machine-dependent transformations on X
>in certain cases.  This is not necessary since the code
>below can handle all possible cases, but machine-dependent
>transformations can make better code.  */
>   {
>   rtx orig_x = x;
>   x = targetm.addr_space.legitimize_address (x, oldx, mode, as);
>   if (orig_x != x && memory_address_addr_space_p (mode, x, as))
> goto done;
>   }
> 

Hi Eric,

The problem occurs when accessing local array element. For example,
# VUSE <.MEM_27>
k_8 = parent[k_29];

GCC calculates the address in three steps:
1) addr is calculated as "r105 + (-2064)".
2) offset is calculated as "r165*4".
3) calls offset_address to combine the address into "r105+ r165*4 +
(-2064)".

Since ADDR is valid and there is no call to memory_address_addr_space in
offset_address, the invalid address expression has no chance to go through
target dependent legitimization function.  Even there is a chance, the
current implementation of memory_address_addr_space prevents the scaled
address expression from being generated because of below code:
  if (! cse_not_expected && !REG_P (x))
  x = break_out_memory_refs (x);

Thanks.
bin





Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference

2013-06-15 Thread Eric Botcazou
> As reported in pr57540, gcc chooses bad address mode, resulting in A)
> invariant part of address expression is not kept or hoisted; b) additional
> computation which should be encoded in address expression.  The reason is
> when gcc runs into "addr+offset" (which is invalid) during expanding, it
> pre-computes the entire address and accesses memory unit using "MEM[reg]".
> Yet we can force addr into register and try to generate "reg+offset" which
> is valid for targets like ARM.  By doing this, we can:
> 1) keep addr in loop invariant form and hoist it later;
> 2) saving additional computation by taking advantage of scaled addressing
> mode;

Does the invalid address not go through arm_legitimize_address from here?

  /* Perform machine-dependent transformations on X
 in certain cases.  This is not necessary since the code
 below can handle all possible cases, but machine-dependent
 transformations can make better code.  */
  {
rtx orig_x = x;
x = targetm.addr_space.legitimize_address (x, oldx, mode, as);
if (orig_x != x && memory_address_addr_space_p (mode, x, as))
  goto done;
  }

-- 
Eric Botcazou


[PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference

2013-06-14 Thread Bin Cheng
Hi,
As reported in pr57540, gcc chooses bad address mode, resulting in A)
invariant part of address expression is not kept or hoisted; b) additional
computation which should be encoded in address expression.  The reason is
when gcc runs into "addr+offset" (which is invalid) during expanding, it
pre-computes the entire address and accesses memory unit using "MEM[reg]".
Yet we can force addr into register and try to generate "reg+offset" which
is valid for targets like ARM.  By doing this, we can:
1) keep addr in loop invariant form and hoist it later;
2) saving additional computation by taking advantage of scaled addressing
mode;

This issue has substantial impact for ARM mode, and also benefits Thumb2
although not so much as ARM mode.  For example from the bug entry, assembly
code like:
blt .L3
.L5:
add lr, sp, #2064loop invariant
add r2, r2, #1
add r3, lr, r3, asl #2
ldr r3, [r3, #-2064]
cmp r3, #0
bge .L5
uxtbr2, r2

can be optimized into:

blt .L3 
.L5:
add r2, r2, #1
ldr r3, [sp, r3, asl #2]
cmp r3, #0
bge .L5
uxtbr2, r2

Bootstrap and test on x86/arm, any comments?

Thanks.
bin

2013-06-13  Bin Cheng  

PR target/57540
* emit-rtl.c (offset_address): Try to force ADDR into register and
generate reg+offset if addr+offset is invalid.
Index: gcc/emit-rtl.c
===
--- gcc/emit-rtl.c  (revision 199949)
+++ gcc/emit-rtl.c  (working copy)
@@ -2175,15 +2175,20 @@ offset_address (rtx memref, rtx offset, unsigned H
 
   /* At this point we don't know _why_ the address is invalid.  It
  could have secondary memory references, multiplies or anything.
+ Yet we can try to force addr into register, in order to catch
+ the scaled addressing opportunity as "reg + scaled_offset".
 
- However, if we did go and rearrange things, we can wind up not
+ Otherwise, if we did go and rearrange things, we can wind up not
  being able to recognize the magic around pic_offset_table_rtx.
  This stuff is fragile, and is yet another example of why it is
- bad to expose PIC machinery too early.  */
+ bad to expose PIC machinery too early.  We may also wind up not
+ being able to recognize the scaled addressing pattern.
+
+ It won't hurt because the address here is invalid and we are
+ going to pre-compute it anyway.  */
   if (! memory_address_addr_space_p (GET_MODE (memref), new_rtx,
 attrs.addrspace)
-  && GET_CODE (addr) == PLUS
-  && XEXP (addr, 0) == pic_offset_table_rtx)
+  && GET_CODE (addr) == PLUS)
 {
   addr = force_reg (GET_MODE (addr), addr);
   new_rtx = simplify_gen_binary (PLUS, address_mode, addr, offset);