Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference

2013-06-22 Thread Oleg Endo
On Wed, 2013-06-19 at 22:42 +0800, Bin.Cheng wrote:
 On Tue, Jun 18, 2013 at 10:02 PM, Oleg Endo oleg.e...@t-online.de wrote:
 
  No, I haven't disabled ivopt.
 
 
 But -fno-ivopts is specified in PR50749.
 With current implementation, auto-inc-dec iterates instructions
 backward, tries to find memory access and increment/decrement pairs.
 It will miss opportunities if instructions are interfered with each
 other.

Sorry for the confusion.  I used -fno.ivopts in the original examples in
the PR to emphasize one of the auto-inc-dec problems.  If ivopts is not
disabled there will be no auto-inc addressing modes at all.
I've added a comment in the PR clarifying this.

Cheers,
Oleg



Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference

2013-06-19 Thread Bin.Cheng
On Tue, Jun 18, 2013 at 10:02 PM, Oleg Endo oleg.e...@t-online.de wrote:
 On Tue, 2013-06-18 at 18:09 +0800, Bin.Cheng wrote:
 On Tue, Jun 18, 2013 at 3:52 AM, Oleg Endo oleg.e...@t-online.de wrote:
 
  My observation is, that legitimizing addressing modes in the backend by
  looking at one isolated address works, but doesn't give good results.
  In the SH backend there is this a particular case with displacement
  addressing (register + constant).  On SH displacements for byte
  addressing are 0..15, 0..31 for 16 bit words and 0..63 for 32 bit words.
  sh_legitimize_address (or rather sh_find_mov_disp_adjust) uses a fixed
  heuristic to satisfy the displacement constraint and splits out a plus
  insn if needed to adjust the base address.  Of course that fixed
  heuristic doesn't work for some cases and thus sometimes results in
  unnecessary base address adjustments.
  I had the idea of replacing the current (partially defunct) auto-inc-dec
  RTL pass with a more generic addressing mode selection pass:
  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56590
 
  Any suggestions/comments/... are highly appreciated.
 
 In PR56590, is PR50749 the only one that correlate with auto-inc-dec?
 Others seem just problems of wrong addressing mode.

 Yes, PR 50749 was the initial description of auto-inc-dec defects.  PR
 52049 is also related to it, as the code examples there are candidates
 for post-inc addressing mode.  In that case, if 'int' is replaced with
 'float' on SH post-inc is the optimal mode, because it doesn't have
 displacement addressing for FPU loads, except than SH2A.  Even then,
 using post-inc is better as it has a more compact instruction encoding.
 The current auto-inc-dec is not able to discover such opportunities if,
 for example, mem accesses are reordered by preceding optimization
 passes.

 And one point on PR50749, auto-inc-dec depends on ivopt to choose
 auto-increment candidate.  Since you disabled ivopt, I bet GCC will
 miss lots of auto-increment opportunities.

 No, I haven't disabled ivopt.


But -fno-ivopts is specified in PR50749.
With current implementation, auto-inc-dec iterates instructions
backward, tries to find memory access and increment/decrement pairs.
It will miss opportunities if instructions are interfered with each
other.

Thanks.
bin
--
Best Regards.


Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference

2013-06-18 Thread Bin.Cheng
On Tue, Jun 18, 2013 at 3:52 AM, Oleg Endo oleg.e...@t-online.de wrote:
 On Mon, 2013-06-17 at 10:41 +0200, Eric Botcazou wrote:
  My mistake. It's because arm_legitimize_address cannot re-factor [r105 +
  r165*4 + (-2064)] into rx = r105 + (-2064); [rx + r165*4].  Do you
  suggest that this kind of transformation should be done in backend?  I can
  think of some disadvantages by doing it in backend:
  Different kinds of address expression might be wanted in different phase of
  compilation, for example, sometimes GCC wants to force computation of
  address expression out of memory access because it cannot CSE array
  indexing.  It's difficult to differentiate in backend.

 It's equally difficult in memory_address_addr_space and it affects all the
 architectures at once...  You said in the original message that Thumb2 and 
 ARM
 modes are already not equally sensitive to it, so it's not unthinkable that
 some architectures might not want it at all.  Therefore, yes, this should
 preferably be handled in arm_legitimize_address.

 My observation is, that legitimizing addressing modes in the backend by
 looking at one isolated address works, but doesn't give good results.
 In the SH backend there is this a particular case with displacement
 addressing (register + constant).  On SH displacements for byte
 addressing are 0..15, 0..31 for 16 bit words and 0..63 for 32 bit words.
 sh_legitimize_address (or rather sh_find_mov_disp_adjust) uses a fixed
 heuristic to satisfy the displacement constraint and splits out a plus
 insn if needed to adjust the base address.  Of course that fixed
 heuristic doesn't work for some cases and thus sometimes results in
 unnecessary base address adjustments.
 I had the idea of replacing the current (partially defunct) auto-inc-dec
 RTL pass with a more generic addressing mode selection pass:
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56590

 Any suggestions/comments/... are highly appreciated.

In PR56590, is PR50749 the only one that correlate with auto-inc-dec?
Others seem just problems of wrong addressing mode.

And one point on PR50749, auto-inc-dec depends on ivopt to choose
auto-increment candidate.  Since you disabled ivopt, I bet GCC will
miss lots of auto-increment opportunities.

Thanks.
bin
--
Best Regards.


Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference

2013-06-18 Thread Oleg Endo
On Tue, 2013-06-18 at 18:09 +0800, Bin.Cheng wrote:
 On Tue, Jun 18, 2013 at 3:52 AM, Oleg Endo oleg.e...@t-online.de wrote:
 
  My observation is, that legitimizing addressing modes in the backend by
  looking at one isolated address works, but doesn't give good results.
  In the SH backend there is this a particular case with displacement
  addressing (register + constant).  On SH displacements for byte
  addressing are 0..15, 0..31 for 16 bit words and 0..63 for 32 bit words.
  sh_legitimize_address (or rather sh_find_mov_disp_adjust) uses a fixed
  heuristic to satisfy the displacement constraint and splits out a plus
  insn if needed to adjust the base address.  Of course that fixed
  heuristic doesn't work for some cases and thus sometimes results in
  unnecessary base address adjustments.
  I had the idea of replacing the current (partially defunct) auto-inc-dec
  RTL pass with a more generic addressing mode selection pass:
  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56590
 
  Any suggestions/comments/... are highly appreciated.
 
 In PR56590, is PR50749 the only one that correlate with auto-inc-dec?
 Others seem just problems of wrong addressing mode.

Yes, PR 50749 was the initial description of auto-inc-dec defects.  PR
52049 is also related to it, as the code examples there are candidates
for post-inc addressing mode.  In that case, if 'int' is replaced with
'float' on SH post-inc is the optimal mode, because it doesn't have
displacement addressing for FPU loads, except than SH2A.  Even then,
using post-inc is better as it has a more compact instruction encoding.
The current auto-inc-dec is not able to discover such opportunities if,
for example, mem accesses are reordered by preceding optimization
passes.

 And one point on PR50749, auto-inc-dec depends on ivopt to choose
 auto-increment candidate.  Since you disabled ivopt, I bet GCC will
 miss lots of auto-increment opportunities.

No, I haven't disabled ivopt.

Cheers,
Oleg



Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference

2013-06-17 Thread Eric Botcazou
 The problem occurs when accessing local array element. For example,
 # VUSE .MEM_27
 k_8 = parent[k_29];
 
 GCC calculates the address in three steps:
 1) addr is calculated as r105 + (-2064).
 2) offset is calculated as r165*4.
 3) calls offset_address to combine the address into r105+ r165*4 +
 (-2064).
 
 Since ADDR is valid and there is no call to memory_address_addr_space in
 offset_address, the invalid address expression has no chance to go through
 target dependent legitimization function.

But offset_address calls change_address_1 with validate set to 1 so during RTL 
expansion memory_address_addr_space should be invoked on the invalid address.

 Even there is a chance, the
 current implementation of memory_address_addr_space prevents the scaled
 address expression from being generated because of below code:
   if (! cse_not_expected  !REG_P (x))
 x = break_out_memory_refs (x);

Where are the memory references in the above address?

-- 
Eric Botcazou


RE: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference

2013-06-17 Thread Bin Cheng


 -Original Message-
 From: Eric Botcazou [mailto:ebotca...@adacore.com]
 Sent: Monday, June 17, 2013 3:32 PM
 To: Bin Cheng
 Cc: gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address
mode
 when expanding array reference
 
  The problem occurs when accessing local array element. For example, #
  VUSE .MEM_27
  k_8 = parent[k_29];
 
  GCC calculates the address in three steps:
  1) addr is calculated as r105 + (-2064).
  2) offset is calculated as r165*4.
  3) calls offset_address to combine the address into r105+ r165*4 +
  (-2064).
 
  Since ADDR is valid and there is no call to memory_address_addr_space
  in offset_address, the invalid address expression has no chance to go
  through target dependent legitimization function.
 
 But offset_address calls change_address_1 with validate set to 1 so during
RTL
 expansion memory_address_addr_space should be invoked on the invalid
address.
Yes, I missed that part.

 
  Even there is a chance, the
  current implementation of memory_address_addr_space prevents the
  scaled address expression from being generated because of below code:
if (! cse_not_expected  !REG_P (x))
x = break_out_memory_refs (x);
 
 Where are the memory references in the above address?
My mistake. It's because arm_legitimize_address cannot re-factor [r105 +
r165*4 + (-2064)] into rx = r105 + (-2064); [rx + r165*4].  Do you
suggest that this kind of transformation should be done in backend?  I can
think of some disadvantages by doing it in backend:
Different kinds of address expression might be wanted in different phase of
compilation, for example, sometimes GCC wants to force computation of
address expression out of memory access because it cannot CSE array
indexing.  It's difficult to differentiate in backend.

Thanks.
bin





Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference

2013-06-17 Thread Eric Botcazou
 My mistake. It's because arm_legitimize_address cannot re-factor [r105 +
 r165*4 + (-2064)] into rx = r105 + (-2064); [rx + r165*4].  Do you
 suggest that this kind of transformation should be done in backend?  I can
 think of some disadvantages by doing it in backend:
 Different kinds of address expression might be wanted in different phase of
 compilation, for example, sometimes GCC wants to force computation of
 address expression out of memory access because it cannot CSE array
 indexing.  It's difficult to differentiate in backend.

It's equally difficult in memory_address_addr_space and it affects all the 
architectures at once...  You said in the original message that Thumb2 and ARM 
modes are already not equally sensitive to it, so it's not unthinkable that 
some architectures might not want it at all.  Therefore, yes, this should 
preferably be handled in arm_legitimize_address.

-- 
Eric Botcazou


RE: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference

2013-06-17 Thread Bin Cheng


 -Original Message-
 From: Eric Botcazou [mailto:ebotca...@adacore.com]
 Sent: Monday, June 17, 2013 4:42 PM
 To: Bin Cheng
 Cc: gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address
mode
 when expanding array reference
 
  My mistake. It's because arm_legitimize_address cannot re-factor
  [r105 +
  r165*4 + (-2064)] into rx = r105 + (-2064); [rx + r165*4].  Do you
  suggest that this kind of transformation should be done in backend?  I
  can think of some disadvantages by doing it in backend:
  Different kinds of address expression might be wanted in different
  phase of compilation, for example, sometimes GCC wants to force
  computation of address expression out of memory access because it
  cannot CSE array indexing.  It's difficult to differentiate in backend.
 
 It's equally difficult in memory_address_addr_space and it affects all the
 architectures at once...  You said in the original message that Thumb2 and
ARM
 modes are already not equally sensitive to it, so it's not unthinkable
that
 some architectures might not want it at all.  Therefore, yes, this should
 preferably be handled in arm_legitimize_address.

I was thinking it would be more accurate to capture the scaled_offset
without over-kill by doing it in offset_address.  Only problem I can image
is the change forces addr into register, which might nullifies backend
dependent transformation.
I will try to do it in arm_legitimize_address and see what's the result.

Thanks.
bin





Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference

2013-06-17 Thread Oleg Endo
On Mon, 2013-06-17 at 10:41 +0200, Eric Botcazou wrote:
  My mistake. It's because arm_legitimize_address cannot re-factor [r105 +
  r165*4 + (-2064)] into rx = r105 + (-2064); [rx + r165*4].  Do you
  suggest that this kind of transformation should be done in backend?  I can
  think of some disadvantages by doing it in backend:
  Different kinds of address expression might be wanted in different phase of
  compilation, for example, sometimes GCC wants to force computation of
  address expression out of memory access because it cannot CSE array
  indexing.  It's difficult to differentiate in backend.
 
 It's equally difficult in memory_address_addr_space and it affects all the 
 architectures at once...  You said in the original message that Thumb2 and 
 ARM 
 modes are already not equally sensitive to it, so it's not unthinkable that 
 some architectures might not want it at all.  Therefore, yes, this should 
 preferably be handled in arm_legitimize_address.

My observation is, that legitimizing addressing modes in the backend by
looking at one isolated address works, but doesn't give good results.
In the SH backend there is this a particular case with displacement
addressing (register + constant).  On SH displacements for byte
addressing are 0..15, 0..31 for 16 bit words and 0..63 for 32 bit words.
sh_legitimize_address (or rather sh_find_mov_disp_adjust) uses a fixed
heuristic to satisfy the displacement constraint and splits out a plus
insn if needed to adjust the base address.  Of course that fixed
heuristic doesn't work for some cases and thus sometimes results in
unnecessary base address adjustments.
I had the idea of replacing the current (partially defunct) auto-inc-dec
RTL pass with a more generic addressing mode selection pass:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56590

Any suggestions/comments/... are highly appreciated.

Cheers,
Oleg



RE: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference

2013-06-16 Thread Bin Cheng


 -Original Message-
 From: Eric Botcazou [mailto:ebotca...@adacore.com]
 Sent: Saturday, June 15, 2013 5:37 PM
 To: Bin Cheng
 Cc: gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address
mode
 when expanding array reference
 
  As reported in pr57540, gcc chooses bad address mode, resulting in A)
  invariant part of address expression is not kept or hoisted; b)
  additional computation which should be encoded in address expression.
  The reason is when gcc runs into addr+offset (which is invalid)
  during expanding, it pre-computes the entire address and accesses memory
 unit using MEM[reg].
  Yet we can force addr into register and try to generate reg+offset
  which is valid for targets like ARM.  By doing this, we can:
  1) keep addr in loop invariant form and hoist it later;
  2) saving additional computation by taking advantage of scaled
  addressing mode;
 
 Does the invalid address not go through arm_legitimize_address from here?
 
   /* Perform machine-dependent transformations on X
in certain cases.  This is not necessary since the code
below can handle all possible cases, but machine-dependent
transformations can make better code.  */
   {
   rtx orig_x = x;
   x = targetm.addr_space.legitimize_address (x, oldx, mode, as);
   if (orig_x != x  memory_address_addr_space_p (mode, x, as))
 goto done;
   }
 

Hi Eric,

The problem occurs when accessing local array element. For example,
# VUSE .MEM_27
k_8 = parent[k_29];

GCC calculates the address in three steps:
1) addr is calculated as r105 + (-2064).
2) offset is calculated as r165*4.
3) calls offset_address to combine the address into r105+ r165*4 +
(-2064).

Since ADDR is valid and there is no call to memory_address_addr_space in
offset_address, the invalid address expression has no chance to go through
target dependent legitimization function.  Even there is a chance, the
current implementation of memory_address_addr_space prevents the scaled
address expression from being generated because of below code:
  if (! cse_not_expected  !REG_P (x))
  x = break_out_memory_refs (x);

Thanks.
bin





Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference

2013-06-15 Thread Eric Botcazou
 As reported in pr57540, gcc chooses bad address mode, resulting in A)
 invariant part of address expression is not kept or hoisted; b) additional
 computation which should be encoded in address expression.  The reason is
 when gcc runs into addr+offset (which is invalid) during expanding, it
 pre-computes the entire address and accesses memory unit using MEM[reg].
 Yet we can force addr into register and try to generate reg+offset which
 is valid for targets like ARM.  By doing this, we can:
 1) keep addr in loop invariant form and hoist it later;
 2) saving additional computation by taking advantage of scaled addressing
 mode;

Does the invalid address not go through arm_legitimize_address from here?

  /* Perform machine-dependent transformations on X
 in certain cases.  This is not necessary since the code
 below can handle all possible cases, but machine-dependent
 transformations can make better code.  */
  {
rtx orig_x = x;
x = targetm.addr_space.legitimize_address (x, oldx, mode, as);
if (orig_x != x  memory_address_addr_space_p (mode, x, as))
  goto done;
  }

-- 
Eric Botcazou


[PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference

2013-06-14 Thread Bin Cheng
Hi,
As reported in pr57540, gcc chooses bad address mode, resulting in A)
invariant part of address expression is not kept or hoisted; b) additional
computation which should be encoded in address expression.  The reason is
when gcc runs into addr+offset (which is invalid) during expanding, it
pre-computes the entire address and accesses memory unit using MEM[reg].
Yet we can force addr into register and try to generate reg+offset which
is valid for targets like ARM.  By doing this, we can:
1) keep addr in loop invariant form and hoist it later;
2) saving additional computation by taking advantage of scaled addressing
mode;

This issue has substantial impact for ARM mode, and also benefits Thumb2
although not so much as ARM mode.  For example from the bug entry, assembly
code like:
blt .L3
.L5:
add lr, sp, #2064loop invariant
add r2, r2, #1
add r3, lr, r3, asl #2
ldr r3, [r3, #-2064]
cmp r3, #0
bge .L5
uxtbr2, r2

can be optimized into:

blt .L3 
.L5:
add r2, r2, #1
ldr r3, [sp, r3, asl #2]
cmp r3, #0
bge .L5
uxtbr2, r2

Bootstrap and test on x86/arm, any comments?

Thanks.
bin

2013-06-13  Bin Cheng  bin.ch...@arm.com

PR target/57540
* emit-rtl.c (offset_address): Try to force ADDR into register and
generate reg+offset if addr+offset is invalid.
Index: gcc/emit-rtl.c
===
--- gcc/emit-rtl.c  (revision 199949)
+++ gcc/emit-rtl.c  (working copy)
@@ -2175,15 +2175,20 @@ offset_address (rtx memref, rtx offset, unsigned H
 
   /* At this point we don't know _why_ the address is invalid.  It
  could have secondary memory references, multiplies or anything.
+ Yet we can try to force addr into register, in order to catch
+ the scaled addressing opportunity as reg + scaled_offset.
 
- However, if we did go and rearrange things, we can wind up not
+ Otherwise, if we did go and rearrange things, we can wind up not
  being able to recognize the magic around pic_offset_table_rtx.
  This stuff is fragile, and is yet another example of why it is
- bad to expose PIC machinery too early.  */
+ bad to expose PIC machinery too early.  We may also wind up not
+ being able to recognize the scaled addressing pattern.
+
+ It won't hurt because the address here is invalid and we are
+ going to pre-compute it anyway.  */
   if (! memory_address_addr_space_p (GET_MODE (memref), new_rtx,
 attrs.addrspace)
-   GET_CODE (addr) == PLUS
-   XEXP (addr, 0) == pic_offset_table_rtx)
+   GET_CODE (addr) == PLUS)
 {
   addr = force_reg (GET_MODE (addr), addr);
   new_rtx = simplify_gen_binary (PLUS, address_mode, addr, offset);