Re: [PATCH, RFC] Re-work find_reloads_subreg_address (Re: [PATCH][RFC, Reload]. Reload bug?)

2012-08-20 Thread Tejas Belagod

Tejas Belagod wrote:

Ulrich Weigand wrote:

The following patch implements this idea; it passes a basic regression
test on arm-linux-gnueabi.  (Obviously this would need a lot more
testing on various platforms before getting into mainline ...)

Can you have a look whether this fixes the problem you're seeing?



Sorry for the delay in replying. Thanks for the patch. I tried this patch -  it
doesn't seem to reach as far as cleanup_subreg_operands (), but fails an
assertion in push_reload () in reload.c:1307. I'm currently investigating this
and will let you know the reason soon.



Hi,

Sorry for the delay in replying. These new issues that I was seeing were bugs 
specific to my code that I've fixed. Your patch seems to work fine. Thanks a lot 
for the patch.


Thanks,
Tejas Belagod.
ARM.



Re: [PATCH, RFC] Re-work find_reloads_subreg_address (Re: [PATCH][RFC, Reload]. Reload bug?)

2012-08-02 Thread Tejas Belagod

Ulrich Weigand wrote:


The following patch implements this idea; it passes a basic regression
test on arm-linux-gnueabi.  (Obviously this would need a lot more
testing on various platforms before getting into mainline ...)

Can you have a look whether this fixes the problem you're seeing?



Sorry for the delay in replying. Thanks for the patch. I tried this patch -  it 
doesn't seem to reach as far as cleanup_subreg_operands (), but fails an 
assertion in push_reload () in reload.c:1307. I'm currently investigating this 
and will let you know the reason soon.


Thanks,
Tejas Belagod.
ARM.



Bernd, Vlad, I'd appreciate your comments on this approach as well.

Thanks,
Ulrich


ChangeLog:

* reload.c (find_reloads_subreg_address): Remove FORCE_REPLACE
parameter.  Always replace normal subreg with memory reference
whenever possible.  Return NULL otherwise.
(find_reloads_toplev): Always call find_reloads_subreg_address
for subregs of registers equivalent to a memory location.
Only recurse further if find_reloads_subreg_address fails.
(find_reloads_address_1): Only call find_reloads_subreg_address
for subregs of registers equivalent to a memory location.
Properly handle failure of find_reloads_subreg_address.


Index: gcc/reload.c
===
*** gcc/reload.c(revision 189809)
--- gcc/reload.c(working copy)
*** static int find_reloads_address_1 (enum
*** 282,288 
  static void find_reloads_address_part (rtx, rtx *, enum reg_class,
   enum machine_mode, int,
   enum reload_type, int);
! static rtx find_reloads_subreg_address (rtx, int, int, enum reload_type,
int, rtx, int *);
  static void copy_replacements_1 (rtx *, rtx *, int);
  static int find_inc_amount (rtx, rtx);
--- 282,288 
  static void find_reloads_address_part (rtx, rtx *, enum reg_class,
   enum machine_mode, int,
   enum reload_type, int);
! static rtx find_reloads_subreg_address (rtx, int, enum reload_type,
int, rtx, int *);
  static void copy_replacements_1 (rtx *, rtx *, int);
  static int find_inc_amount (rtx, rtx);
*** find_reloads_toplev (rtx x, int opnum, e
*** 4755,4785 
}

/* If the subreg contains a reg that will be converted to a mem,
!convert the subreg to a narrower memref now.
!Otherwise, we would get (subreg (mem ...) ...),
!which would force reload of the mem.
!
!We also need to do this if there is an equivalent MEM that is
!not offsettable.  In that case, alter_subreg would produce an
!invalid address on big-endian machines.
!
!For machines that extend byte loads, we must not reload using
!a wider mode if we have a paradoxical SUBREG.  find_reloads will
!force a reload in that case.  So we should not do anything here.  */

if (regno >= FIRST_PSEUDO_REGISTER
! #ifdef LOAD_EXTEND_OP
! && !paradoxical_subreg_p (x)
! #endif
! && (reg_equiv_address (regno) != 0
! || (reg_equiv_mem (regno) != 0
! && (! strict_memory_address_addr_space_p
! (GET_MODE (x), XEXP (reg_equiv_mem (regno), 0),
!  MEM_ADDR_SPACE (reg_equiv_mem (regno)))
! || ! offsettable_memref_p (reg_equiv_mem (regno))
! || num_not_at_initial_offset
!   x = find_reloads_subreg_address (x, 1, opnum, type, ind_levels,
!  insn, address_reloaded);
  }

for (copied = 0, i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
--- 4755,4773 
}

/* If the subreg contains a reg that will be converted to a mem,
!attempt to convert the whole subreg to a (narrower or wider)
!memory reference instead.  If this succeeds, we're done --
!otherwise fall through to check whether the inner reg still
!needs address reloads anyway.  */

if (regno >= FIRST_PSEUDO_REGISTER
! && reg_equiv_memory_loc (regno) != 0)
!   {
! tem = find_reloads_subreg_address (x, opnum, type, ind_levels,
!insn, address_reloaded);
! if (tem)
!   return tem;
!   }
  }

for (copied = 0, i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
*** find_reloads_address_1 (enum machine_mod
*** 6018,6029 
  if (ira_reg_class_max_nregs [rclass][GET_MODE (SUBREG_REG (x))]
  > reg_class_size[(int) rclass])
{
! x = find_reloads_subreg_address (x, 0, opnum,
!  ADDR_TYPE (type),
! 

[PATCH, RFC] Re-work find_reloads_subreg_address (Re: [PATCH][RFC, Reload]. Reload bug?)

2012-07-27 Thread Ulrich Weigand
Tejas Belagod wrote:
> Tejas Belagod wrote:
> > This is because offsettable_address_addr_space_p () gets as far as calling
> > strict_memory_address_addr_space_p () with a QImode and (mode_sz - 1) which
> > returns true. The only way I see offsettable_address_addr_space_p () 
> > returning
> > false would be mode_dependent_address_p () to return true for addr 
> > expression
> > (PLUS (reg) (16)) which partly makes sense to me because PLUS is a
> > mode-dependent address in that it cannot be allowed for NEON addressing 
> > modes,
> > but it seems very generic for mode_dependent_address_p () to return true for
> > PLUS in general instead of making a special case for vector modes. This
> > distinction cannot be made in the target's mode_dependent_address_p() as 
> > 'mode'
> > is not supplied to it.
> 
> I dug a little deeper into offsettable_address_addr_space_p () and realized 
> that
> it only gets reg_equiv_mem () which is MEM:OI (reg sp) to work with which does
> not include the SUBREG_BYTE, therefore mode_dependent_address_p () does not 
> have
> PLUS to check for address tree-mode dependency.

Sorry for the late reply, it took me a while to understand what's
really going on here.

I now agree that this is definitely a bug in reload; it's clear that
offsettable_memref_p does not and cannot catch this case.  In fact,
it does not even have enough information to answer the question in
any sensible way (except for just about always returning "no", which
wouldn't really be useful).

I also agree with the general approach in your patch.  The basic idea
is that it makes no sense to ask a generic question like "would it
be valid to add some (unknown) offset and change to some (unknown)
mode?", when instead we can ask quite specifically for the *known*
offset and mode we want to change to.  However, I'd prefer this to
go even further than what your patch does: I think we should not
be querying "offsettable_memref_p" *at all* here.

With your patch, you still call offsettable_memref_p on the address
that has already been offset -- asking for more offsets seems pointless.
Also, there is another call to offsettable_memref_p *within*
find_reloads_subreg_address --which gets used when FORCE_REPLACE is
false-- which suffers from the same problem as the call in
find_reloads_toplev, and likewise ought to be eliminated.


Taking a step back, let's look at the ways a (subreg (reg)) where
reg is a pseudo equivalent to a memory location can be handled:

- The "default" way would be to reload the inner reg in its proper
  mode from its equivalent memory location into a reload register,
  and use a subreg of that register as the operand.  This always
  works correcly, but sometime causes unnecessary reloads, if the
  insn could actually handle a memory operand directly.

- To avoid that unnecessary reload, we can instead attempt to replace
  the whole subreg with a modified (usually narrowed or widended)
  memory reference.  This can then be either used directly in the insn,
  or itself be reloaded.

In the second case (outer reload), there can be a number of issues:

- We may not be allowed at all to change the memory access if:

  * we have a paradoxical subreg that is implictly handled as a
LOAD_EXTEND or ZERO_EXTEND due to LOAD_EXTEND_OP

  * we have a normal subreg that is implicitly acting on the full
register due to WORD_REGISTER_OPERATIONS  (the check for this
seems to be incomplete in current code!)

  * the equivalent memory address is mode-dependent

  * we have a paradoxical subreg, and we cannot prove the widened
memory is properly aligned (so we may cause either a misaligned
access, or access to unmapped memory)

- Even if we are in principle allowed to change the memory access,
  the modified address might not be valid (either because the
  original equivalent address is already invalid, or because it
  becomes invalid when adding an offset and/or changing its mode).
  We can still do the outer access in that case, but we will have
  to push address reloads (based on the modified address).

  Current code tries to be clever as to when to perform the
  substitution of the modified memory address: if it thinks no
  address reloads will be required in either case, it leaves the
  address as (subreg (reg)), allowing find_reloads to choose
  between (inner or outer) reloads or doing an (outer) access
  as memory operand directly.  In either case, the actual change
  to a mem happens in cleanup_subreg_operands at the end.

  On the other hand, if address reloads *are* required, it is
  find_reloads_toplev/find_reloads_subreg_address that will
  replace either the subreg or the reg with an explicit (outer
  or inner) memory access, and push the corresponding address
  reloads.  [ Note that find_reloads now no longer has the
  choice of switching between inner and outer access.  In the
  case of an outer access, there still is the choice between
  a direct memory access in the insn and a reload.  

Re: [PATCH][RFC, Reload]. Reload bug?

2012-07-04 Thread Tejas Belagod

Tejas Belagod wrote:

Ulrich Weigand wrote:

Tejas Belagod wrote:


Therefore strict_memory_address_addr_space_P () thinks that
(mem:OI (reg sp)) is a valid target address and lets it pass as
a subreg and does not narrow the subreg into a narrower memref.
find_reloads_toplev () should have infact given 
strict_memory_address_addr_space_P ()

(mem:OI (plus:DI (reg sp) (const_int 16)))
which will be returned as false as base+offset is invalid for NEON
addressing modes and this will be reloaded into a narrower memref.

Huh.  I would have expected the offsettable_memref_p check


- && (reg_equiv_address (regno) != 0
- || (reg_equiv_mem (regno) != 0
- && (! strict_memory_address_addr_space_p
- (GET_MODE (x), XEXP (reg_equiv_mem (regno), 0),
-  MEM_ADDR_SPACE (reg_equiv_mem (regno)))
- || ! offsettable_memref_p (reg_equiv_mem (regno))

^^^ here

- || num_not_at_initial_offset

to fail, which should cause find_reloads_subreg_address to get called.

Why is that not happening for you?


This is because offsettable_address_addr_space_p () gets as far as calling
strict_memory_address_addr_space_p () with a QImode and (mode_sz - 1) which
returns true. The only way I see offsettable_address_addr_space_p () returning
false would be mode_dependent_address_p () to return true for addr expression
(PLUS (reg) (16)) which partly makes sense to me because PLUS is a
mode-dependent address in that it cannot be allowed for NEON addressing modes,
but it seems very generic for mode_dependent_address_p () to return true for
PLUS in general instead of making a special case for vector modes. This
distinction cannot be made in the target's mode_dependent_address_p() as 'mode'
is not supplied to it.



I dug a little deeper into offsettable_address_addr_space_p () and realized that
it only gets reg_equiv_mem () which is MEM:OI (reg sp) to work with which does
not include the SUBREG_BYTE, therefore mode_dependent_address_p () does not have
PLUS to check for address tree-mode dependency.

Thanks,
Tejas Belagod.
ARM.



Re: [PATCH][RFC, Reload]. Reload bug?

2012-06-29 Thread Tejas Belagod

Ulrich Weigand wrote:

Tejas Belagod wrote:


Therefore strict_memory_address_addr_space_P () thinks that
(mem:OI (reg sp)) is a valid target address and lets it pass as
a subreg and does not narrow the subreg into a narrower memref.
find_reloads_toplev () should have infact given 
strict_memory_address_addr_space_P ()

(mem:OI (plus:DI (reg sp) (const_int 16)))
which will be returned as false as base+offset is invalid for NEON
addressing modes and this will be reloaded into a narrower memref.


Huh.  I would have expected the offsettable_memref_p check


- && (reg_equiv_address (regno) != 0
- || (reg_equiv_mem (regno) != 0
- && (! strict_memory_address_addr_space_p
- (GET_MODE (x), XEXP (reg_equiv_mem (regno), 0),
-  MEM_ADDR_SPACE (reg_equiv_mem (regno)))
- || ! offsettable_memref_p (reg_equiv_mem (regno))


^^^ here

- || num_not_at_initial_offset


to fail, which should cause find_reloads_subreg_address to get called.

Why is that not happening for you?


This is because offsettable_address_addr_space_p () gets as far as calling
strict_memory_address_addr_space_p () with a QImode and (mode_sz - 1) which
returns true. The only way I see offsettable_address_addr_space_p () returning
false would be mode_dependent_address_p () to return true for addr expression
(PLUS (reg) (16)) which partly makes sense to me because PLUS is a
mode-dependent address in that it cannot be allowed for NEON addressing modes,
but it seems very generic for mode_dependent_address_p () to return true for
PLUS in general instead of making a special case for vector modes. This
distinction cannot be made in the target's mode_dependent_address_p() as 'mode'
is not supplied to it.

Thanks,
Tejas Belagod.
ARM.



Re: [PATCH][RFC, Reload]. Reload bug?

2012-06-29 Thread Ulrich Weigand
Tejas Belagod wrote:

> Therefore strict_memory_address_addr_space_P () thinks that
> (mem:OI (reg sp)) is a valid target address and lets it pass as
> a subreg and does not narrow the subreg into a narrower memref.
> find_reloads_toplev () should have infact given 
> strict_memory_address_addr_space_P ()
> (mem:OI (plus:DI (reg sp) (const_int 16)))
> which will be returned as false as base+offset is invalid for NEON
> addressing modes and this will be reloaded into a narrower memref.

Huh.  I would have expected the offsettable_memref_p check

> -   && (reg_equiv_address (regno) != 0
> -   || (reg_equiv_mem (regno) != 0
> -   && (! strict_memory_address_addr_space_p
> -   (GET_MODE (x), XEXP (reg_equiv_mem (regno), 0),
> -MEM_ADDR_SPACE (reg_equiv_mem (regno)))
> -   || ! offsettable_memref_p (reg_equiv_mem (regno))

^^^ here
> -   || num_not_at_initial_offset

to fail, which should cause find_reloads_subreg_address to get called.

Why is that not happening for you?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com



[PATCH][RFC, Reload]. Reload bug?

2012-06-28 Thread Tejas Belagod


Hi,

Attached is a fix for what seems to be a reload bug while handling 
subreg(mem...). I ran into this problem while implementing support for struct 
load/store in AArch64 using the standard patterns 
vec__lanes on the same lines of the ARM 
backend. The test case that caused the issue was:


void SexiALI_Convert(void *vdest, void *vsrc, unsigned int frames, int n)
{
 unsigned int x;
 short *src = vsrc;
 unsigned char *dest = vdest;
 for(x=0;x<256;x++)
 {
  int tmp;
  tmp = *src;
  src++;
  tmp += *src;
  src++;
  *dest++ = tmp;
 }
}

Before reload, this is the RTL dump I see:

.
(insn 110 114 111 4 (set (reg:V8HI 158 [ vect_var_.21 ])
(subreg:V8HI (reg:OI 530 [ vect_array.20 ]) 0)) ice.i:9 512 
{*aarch64_simd_movv8hi}

 (nil))

(insn 111 110 115 4 (set (reg:V8HI 159 [ vect_var_.22 ])
(subreg:V8HI (reg:OI 530 [ vect_array.20 ]) 16)) ice.i:9 512 
{*aarch64_simd_movv8hi}

 (expr_list:REG_DEAD (reg:OI 530 [ vect_array.20 ])
(nil)))

(insn 115 111 116 4 (set (reg:V8HI 161 [ vect_var_.24 ])
(subreg:V8HI (reg:OI 529 [ vect_array.23 ]) 0)) ice.i:9 512 
{*aarch64_simd_movv8hi}

 (nil))

(insn 116 115 117 4 (set (reg:V8HI 162 [ vect_var_.25 ])
(subreg:V8HI (reg:OI 529 [ vect_array.23 ]) 16)) ice.i:9 512 
{*aarch64_simd_movv8hi}

 (expr_list:REG_DEAD (reg:OI 529 [ vect_array.23 ])
(nil)))

(insn 117 116 118 4 (set (reg:V4SI 544 [ vect_var_.27 ])
(sign_extend:V4SI (vec_select:V4HI (reg:V8HI 159 [ vect_var_.22 ])
(parallel:V8HI [
(const_int 0 [0])
(const_int 1 [0x1])
(const_int 2 [0x2])
(const_int 3 [0x3])
] ice.i:11 700 {aarch64_simd_vec_unpacks_lo_v8hi}
 (nil))

(insn 118 117 124 4 (set (reg:V4SI 545 [ vect_var_.26 ])
(sign_extend:V4SI (vec_select:V4HI (reg:V8HI 158 [ vect_var_.21 ])
(parallel:V8HI [
(const_int 0 [0])
(const_int 1 [0x1])
(const_int 2 [0x2])
(const_int 3 [0x3])
] ice.i:9 700 {aarch64_simd_vec_unpacks_lo_v8hi}
 (nil))

.

In insn 116, reg_equiv_mem () of the psuedoreg 529 is (mem:OI (reg sp)), and the 
subreg is equivalent to:

subreg:V8HI (mem:OI (reg sp) 16)
which does not get folded into
mem:V8HI (plus:DI (reg sp) (const_int 16))
because, in reload.c:find_reloads_toplev () where such subregs are narrowed into
narower memrefs, the memref supplied to strict_memory_address_addr_space_P () is
just (mem:OI (reg sp)) and the SUBREG_BYTE is forgotten. Therefore
strict_memory_address_addr_space_P () thinks that (mem:OI (reg sp)) is a
valid target address and lets it pass as a subreg and does not narrow the subreg
into a narrower memref. find_reloads_toplev () should have infact given
strict_memory_address_addr_space_P () (mem:OI (plus:DI (reg sp) (const_int 16)) 
) which will be returned as false as base+offset is invalid for NEON addressing

modes and this will be reloaded into a narrower memref.

Also, I tried writing a secondary reload for this, but at no time is the RTL
(subreg:V8HI (mem:OI (reg sp)) 16)
available to the target secondary reload for it to fix it up.

Therefore, I've fixed find_reloads_toplev () to pass the full address to 
strict_memory_address_addr_space_P () in the case of subregs.


Does this look like a sane fix?

I've tested this patch on arm-none-eabi and bootstrapped on x86_64-pc-linux and
all is well.

Thanks,
Tejas Belagod.
ARM.

Changelog:

2012-06-28  Tejas Belagod  

gcc/
* reload.c (find_reloads_toplev): Include the subreg byte in the address
of memrefs when converting subregs of mems into narrower memrefs.diff --git a/gcc/reload.c b/gcc/reload.c
index e42cc5c..b6d4ce9 100644
--- a/gcc/reload.c
+++ b/gcc/reload.c
@@ -4771,15 +4771,27 @@ find_reloads_toplev (rtx x, int opnum, enum reload_type 
type,
 #ifdef LOAD_EXTEND_OP
  && !paradoxical_subreg_p (x)
 #endif
- && (reg_equiv_address (regno) != 0
- || (reg_equiv_mem (regno) != 0
- && (! strict_memory_address_addr_space_p
- (GET_MODE (x), XEXP (reg_equiv_mem (regno), 0),
-  MEM_ADDR_SPACE (reg_equiv_mem (regno)))
- || ! offsettable_memref_p (reg_equiv_mem (regno))
- || num_not_at_initial_offset
-   x = find_reloads_subreg_address (x, 1, opnum, type, ind_levels,
-  insn, address_reloaded);
+)
+   {
+ if (reg_equiv_address (regno) != 0)
+   x = find_reloads_subreg_address (x, 1, opnum, type, ind_levels,
+insn, address_reloaded);
+ else if (reg_equiv_mem (regno) != 0)
+   {
+ tem =
+   simplify_gen_subreg (GET_MODE (x), reg_equiv_mem (regno),
+