Re: [PATCH, RFC] Re-work find_reloads_subreg_address (Re: [PATCH][RFC, Reload]. Reload bug?)
Tejas Belagod wrote: Ulrich Weigand wrote: The following patch implements this idea; it passes a basic regression test on arm-linux-gnueabi. (Obviously this would need a lot more testing on various platforms before getting into mainline ...) Can you have a look whether this fixes the problem you're seeing? Sorry for the delay in replying. Thanks for the patch. I tried this patch - it doesn't seem to reach as far as cleanup_subreg_operands (), but fails an assertion in push_reload () in reload.c:1307. I'm currently investigating this and will let you know the reason soon. Hi, Sorry for the delay in replying. These new issues that I was seeing were bugs specific to my code that I've fixed. Your patch seems to work fine. Thanks a lot for the patch. Thanks, Tejas Belagod. ARM.
Re: [PATCH, RFC] Re-work find_reloads_subreg_address (Re: [PATCH][RFC, Reload]. Reload bug?)
Ulrich Weigand wrote: The following patch implements this idea; it passes a basic regression test on arm-linux-gnueabi. (Obviously this would need a lot more testing on various platforms before getting into mainline ...) Can you have a look whether this fixes the problem you're seeing? Sorry for the delay in replying. Thanks for the patch. I tried this patch - it doesn't seem to reach as far as cleanup_subreg_operands (), but fails an assertion in push_reload () in reload.c:1307. I'm currently investigating this and will let you know the reason soon. Thanks, Tejas Belagod. ARM. Bernd, Vlad, I'd appreciate your comments on this approach as well. Thanks, Ulrich ChangeLog: * reload.c (find_reloads_subreg_address): Remove FORCE_REPLACE parameter. Always replace normal subreg with memory reference whenever possible. Return NULL otherwise. (find_reloads_toplev): Always call find_reloads_subreg_address for subregs of registers equivalent to a memory location. Only recurse further if find_reloads_subreg_address fails. (find_reloads_address_1): Only call find_reloads_subreg_address for subregs of registers equivalent to a memory location. Properly handle failure of find_reloads_subreg_address. Index: gcc/reload.c === *** gcc/reload.c(revision 189809) --- gcc/reload.c(working copy) *** static int find_reloads_address_1 (enum *** 282,288 static void find_reloads_address_part (rtx, rtx *, enum reg_class, enum machine_mode, int, enum reload_type, int); ! static rtx find_reloads_subreg_address (rtx, int, int, enum reload_type, int, rtx, int *); static void copy_replacements_1 (rtx *, rtx *, int); static int find_inc_amount (rtx, rtx); --- 282,288 static void find_reloads_address_part (rtx, rtx *, enum reg_class, enum machine_mode, int, enum reload_type, int); ! static rtx find_reloads_subreg_address (rtx, int, enum reload_type, int, rtx, int *); static void copy_replacements_1 (rtx *, rtx *, int); static int find_inc_amount (rtx, rtx); *** find_reloads_toplev (rtx x, int opnum, e *** 4755,4785 } /* If the subreg contains a reg that will be converted to a mem, !convert the subreg to a narrower memref now. !Otherwise, we would get (subreg (mem ...) ...), !which would force reload of the mem. ! !We also need to do this if there is an equivalent MEM that is !not offsettable. In that case, alter_subreg would produce an !invalid address on big-endian machines. ! !For machines that extend byte loads, we must not reload using !a wider mode if we have a paradoxical SUBREG. find_reloads will !force a reload in that case. So we should not do anything here. */ if (regno >= FIRST_PSEUDO_REGISTER ! #ifdef LOAD_EXTEND_OP ! && !paradoxical_subreg_p (x) ! #endif ! && (reg_equiv_address (regno) != 0 ! || (reg_equiv_mem (regno) != 0 ! && (! strict_memory_address_addr_space_p ! (GET_MODE (x), XEXP (reg_equiv_mem (regno), 0), ! MEM_ADDR_SPACE (reg_equiv_mem (regno))) ! || ! offsettable_memref_p (reg_equiv_mem (regno)) ! || num_not_at_initial_offset ! x = find_reloads_subreg_address (x, 1, opnum, type, ind_levels, ! insn, address_reloaded); } for (copied = 0, i = GET_RTX_LENGTH (code) - 1; i >= 0; i--) --- 4755,4773 } /* If the subreg contains a reg that will be converted to a mem, !attempt to convert the whole subreg to a (narrower or wider) !memory reference instead. If this succeeds, we're done -- !otherwise fall through to check whether the inner reg still !needs address reloads anyway. */ if (regno >= FIRST_PSEUDO_REGISTER ! && reg_equiv_memory_loc (regno) != 0) ! { ! tem = find_reloads_subreg_address (x, opnum, type, ind_levels, !insn, address_reloaded); ! if (tem) ! return tem; ! } } for (copied = 0, i = GET_RTX_LENGTH (code) - 1; i >= 0; i--) *** find_reloads_address_1 (enum machine_mod *** 6018,6029 if (ira_reg_class_max_nregs [rclass][GET_MODE (SUBREG_REG (x))] > reg_class_size[(int) rclass]) { ! x = find_reloads_subreg_address (x, 0, opnum, ! ADDR_TYPE (type), !
[PATCH, RFC] Re-work find_reloads_subreg_address (Re: [PATCH][RFC, Reload]. Reload bug?)
Tejas Belagod wrote: > Tejas Belagod wrote: > > This is because offsettable_address_addr_space_p () gets as far as calling > > strict_memory_address_addr_space_p () with a QImode and (mode_sz - 1) which > > returns true. The only way I see offsettable_address_addr_space_p () > > returning > > false would be mode_dependent_address_p () to return true for addr > > expression > > (PLUS (reg) (16)) which partly makes sense to me because PLUS is a > > mode-dependent address in that it cannot be allowed for NEON addressing > > modes, > > but it seems very generic for mode_dependent_address_p () to return true for > > PLUS in general instead of making a special case for vector modes. This > > distinction cannot be made in the target's mode_dependent_address_p() as > > 'mode' > > is not supplied to it. > > I dug a little deeper into offsettable_address_addr_space_p () and realized > that > it only gets reg_equiv_mem () which is MEM:OI (reg sp) to work with which does > not include the SUBREG_BYTE, therefore mode_dependent_address_p () does not > have > PLUS to check for address tree-mode dependency. Sorry for the late reply, it took me a while to understand what's really going on here. I now agree that this is definitely a bug in reload; it's clear that offsettable_memref_p does not and cannot catch this case. In fact, it does not even have enough information to answer the question in any sensible way (except for just about always returning "no", which wouldn't really be useful). I also agree with the general approach in your patch. The basic idea is that it makes no sense to ask a generic question like "would it be valid to add some (unknown) offset and change to some (unknown) mode?", when instead we can ask quite specifically for the *known* offset and mode we want to change to. However, I'd prefer this to go even further than what your patch does: I think we should not be querying "offsettable_memref_p" *at all* here. With your patch, you still call offsettable_memref_p on the address that has already been offset -- asking for more offsets seems pointless. Also, there is another call to offsettable_memref_p *within* find_reloads_subreg_address --which gets used when FORCE_REPLACE is false-- which suffers from the same problem as the call in find_reloads_toplev, and likewise ought to be eliminated. Taking a step back, let's look at the ways a (subreg (reg)) where reg is a pseudo equivalent to a memory location can be handled: - The "default" way would be to reload the inner reg in its proper mode from its equivalent memory location into a reload register, and use a subreg of that register as the operand. This always works correcly, but sometime causes unnecessary reloads, if the insn could actually handle a memory operand directly. - To avoid that unnecessary reload, we can instead attempt to replace the whole subreg with a modified (usually narrowed or widended) memory reference. This can then be either used directly in the insn, or itself be reloaded. In the second case (outer reload), there can be a number of issues: - We may not be allowed at all to change the memory access if: * we have a paradoxical subreg that is implictly handled as a LOAD_EXTEND or ZERO_EXTEND due to LOAD_EXTEND_OP * we have a normal subreg that is implicitly acting on the full register due to WORD_REGISTER_OPERATIONS (the check for this seems to be incomplete in current code!) * the equivalent memory address is mode-dependent * we have a paradoxical subreg, and we cannot prove the widened memory is properly aligned (so we may cause either a misaligned access, or access to unmapped memory) - Even if we are in principle allowed to change the memory access, the modified address might not be valid (either because the original equivalent address is already invalid, or because it becomes invalid when adding an offset and/or changing its mode). We can still do the outer access in that case, but we will have to push address reloads (based on the modified address). Current code tries to be clever as to when to perform the substitution of the modified memory address: if it thinks no address reloads will be required in either case, it leaves the address as (subreg (reg)), allowing find_reloads to choose between (inner or outer) reloads or doing an (outer) access as memory operand directly. In either case, the actual change to a mem happens in cleanup_subreg_operands at the end. On the other hand, if address reloads *are* required, it is find_reloads_toplev/find_reloads_subreg_address that will replace either the subreg or the reg with an explicit (outer or inner) memory access, and push the corresponding address reloads. [ Note that find_reloads now no longer has the choice of switching between inner and outer access. In the case of an outer access, there still is the choice between a direct memory access in the insn and a reload.
Re: [PATCH][RFC, Reload]. Reload bug?
Tejas Belagod wrote: Ulrich Weigand wrote: Tejas Belagod wrote: Therefore strict_memory_address_addr_space_P () thinks that (mem:OI (reg sp)) is a valid target address and lets it pass as a subreg and does not narrow the subreg into a narrower memref. find_reloads_toplev () should have infact given strict_memory_address_addr_space_P () (mem:OI (plus:DI (reg sp) (const_int 16))) which will be returned as false as base+offset is invalid for NEON addressing modes and this will be reloaded into a narrower memref. Huh. I would have expected the offsettable_memref_p check - && (reg_equiv_address (regno) != 0 - || (reg_equiv_mem (regno) != 0 - && (! strict_memory_address_addr_space_p - (GET_MODE (x), XEXP (reg_equiv_mem (regno), 0), - MEM_ADDR_SPACE (reg_equiv_mem (regno))) - || ! offsettable_memref_p (reg_equiv_mem (regno)) ^^^ here - || num_not_at_initial_offset to fail, which should cause find_reloads_subreg_address to get called. Why is that not happening for you? This is because offsettable_address_addr_space_p () gets as far as calling strict_memory_address_addr_space_p () with a QImode and (mode_sz - 1) which returns true. The only way I see offsettable_address_addr_space_p () returning false would be mode_dependent_address_p () to return true for addr expression (PLUS (reg) (16)) which partly makes sense to me because PLUS is a mode-dependent address in that it cannot be allowed for NEON addressing modes, but it seems very generic for mode_dependent_address_p () to return true for PLUS in general instead of making a special case for vector modes. This distinction cannot be made in the target's mode_dependent_address_p() as 'mode' is not supplied to it. I dug a little deeper into offsettable_address_addr_space_p () and realized that it only gets reg_equiv_mem () which is MEM:OI (reg sp) to work with which does not include the SUBREG_BYTE, therefore mode_dependent_address_p () does not have PLUS to check for address tree-mode dependency. Thanks, Tejas Belagod. ARM.
Re: [PATCH][RFC, Reload]. Reload bug?
Ulrich Weigand wrote: Tejas Belagod wrote: Therefore strict_memory_address_addr_space_P () thinks that (mem:OI (reg sp)) is a valid target address and lets it pass as a subreg and does not narrow the subreg into a narrower memref. find_reloads_toplev () should have infact given strict_memory_address_addr_space_P () (mem:OI (plus:DI (reg sp) (const_int 16))) which will be returned as false as base+offset is invalid for NEON addressing modes and this will be reloaded into a narrower memref. Huh. I would have expected the offsettable_memref_p check - && (reg_equiv_address (regno) != 0 - || (reg_equiv_mem (regno) != 0 - && (! strict_memory_address_addr_space_p - (GET_MODE (x), XEXP (reg_equiv_mem (regno), 0), - MEM_ADDR_SPACE (reg_equiv_mem (regno))) - || ! offsettable_memref_p (reg_equiv_mem (regno)) ^^^ here - || num_not_at_initial_offset to fail, which should cause find_reloads_subreg_address to get called. Why is that not happening for you? This is because offsettable_address_addr_space_p () gets as far as calling strict_memory_address_addr_space_p () with a QImode and (mode_sz - 1) which returns true. The only way I see offsettable_address_addr_space_p () returning false would be mode_dependent_address_p () to return true for addr expression (PLUS (reg) (16)) which partly makes sense to me because PLUS is a mode-dependent address in that it cannot be allowed for NEON addressing modes, but it seems very generic for mode_dependent_address_p () to return true for PLUS in general instead of making a special case for vector modes. This distinction cannot be made in the target's mode_dependent_address_p() as 'mode' is not supplied to it. Thanks, Tejas Belagod. ARM.
Re: [PATCH][RFC, Reload]. Reload bug?
Tejas Belagod wrote: > Therefore strict_memory_address_addr_space_P () thinks that > (mem:OI (reg sp)) is a valid target address and lets it pass as > a subreg and does not narrow the subreg into a narrower memref. > find_reloads_toplev () should have infact given > strict_memory_address_addr_space_P () > (mem:OI (plus:DI (reg sp) (const_int 16))) > which will be returned as false as base+offset is invalid for NEON > addressing modes and this will be reloaded into a narrower memref. Huh. I would have expected the offsettable_memref_p check > - && (reg_equiv_address (regno) != 0 > - || (reg_equiv_mem (regno) != 0 > - && (! strict_memory_address_addr_space_p > - (GET_MODE (x), XEXP (reg_equiv_mem (regno), 0), > -MEM_ADDR_SPACE (reg_equiv_mem (regno))) > - || ! offsettable_memref_p (reg_equiv_mem (regno)) ^^^ here > - || num_not_at_initial_offset to fail, which should cause find_reloads_subreg_address to get called. Why is that not happening for you? Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
[PATCH][RFC, Reload]. Reload bug?
Hi, Attached is a fix for what seems to be a reload bug while handling subreg(mem...). I ran into this problem while implementing support for struct load/store in AArch64 using the standard patterns vec__lanes on the same lines of the ARM backend. The test case that caused the issue was: void SexiALI_Convert(void *vdest, void *vsrc, unsigned int frames, int n) { unsigned int x; short *src = vsrc; unsigned char *dest = vdest; for(x=0;x<256;x++) { int tmp; tmp = *src; src++; tmp += *src; src++; *dest++ = tmp; } } Before reload, this is the RTL dump I see: . (insn 110 114 111 4 (set (reg:V8HI 158 [ vect_var_.21 ]) (subreg:V8HI (reg:OI 530 [ vect_array.20 ]) 0)) ice.i:9 512 {*aarch64_simd_movv8hi} (nil)) (insn 111 110 115 4 (set (reg:V8HI 159 [ vect_var_.22 ]) (subreg:V8HI (reg:OI 530 [ vect_array.20 ]) 16)) ice.i:9 512 {*aarch64_simd_movv8hi} (expr_list:REG_DEAD (reg:OI 530 [ vect_array.20 ]) (nil))) (insn 115 111 116 4 (set (reg:V8HI 161 [ vect_var_.24 ]) (subreg:V8HI (reg:OI 529 [ vect_array.23 ]) 0)) ice.i:9 512 {*aarch64_simd_movv8hi} (nil)) (insn 116 115 117 4 (set (reg:V8HI 162 [ vect_var_.25 ]) (subreg:V8HI (reg:OI 529 [ vect_array.23 ]) 16)) ice.i:9 512 {*aarch64_simd_movv8hi} (expr_list:REG_DEAD (reg:OI 529 [ vect_array.23 ]) (nil))) (insn 117 116 118 4 (set (reg:V4SI 544 [ vect_var_.27 ]) (sign_extend:V4SI (vec_select:V4HI (reg:V8HI 159 [ vect_var_.22 ]) (parallel:V8HI [ (const_int 0 [0]) (const_int 1 [0x1]) (const_int 2 [0x2]) (const_int 3 [0x3]) ] ice.i:11 700 {aarch64_simd_vec_unpacks_lo_v8hi} (nil)) (insn 118 117 124 4 (set (reg:V4SI 545 [ vect_var_.26 ]) (sign_extend:V4SI (vec_select:V4HI (reg:V8HI 158 [ vect_var_.21 ]) (parallel:V8HI [ (const_int 0 [0]) (const_int 1 [0x1]) (const_int 2 [0x2]) (const_int 3 [0x3]) ] ice.i:9 700 {aarch64_simd_vec_unpacks_lo_v8hi} (nil)) . In insn 116, reg_equiv_mem () of the psuedoreg 529 is (mem:OI (reg sp)), and the subreg is equivalent to: subreg:V8HI (mem:OI (reg sp) 16) which does not get folded into mem:V8HI (plus:DI (reg sp) (const_int 16)) because, in reload.c:find_reloads_toplev () where such subregs are narrowed into narower memrefs, the memref supplied to strict_memory_address_addr_space_P () is just (mem:OI (reg sp)) and the SUBREG_BYTE is forgotten. Therefore strict_memory_address_addr_space_P () thinks that (mem:OI (reg sp)) is a valid target address and lets it pass as a subreg and does not narrow the subreg into a narrower memref. find_reloads_toplev () should have infact given strict_memory_address_addr_space_P () (mem:OI (plus:DI (reg sp) (const_int 16)) ) which will be returned as false as base+offset is invalid for NEON addressing modes and this will be reloaded into a narrower memref. Also, I tried writing a secondary reload for this, but at no time is the RTL (subreg:V8HI (mem:OI (reg sp)) 16) available to the target secondary reload for it to fix it up. Therefore, I've fixed find_reloads_toplev () to pass the full address to strict_memory_address_addr_space_P () in the case of subregs. Does this look like a sane fix? I've tested this patch on arm-none-eabi and bootstrapped on x86_64-pc-linux and all is well. Thanks, Tejas Belagod. ARM. Changelog: 2012-06-28 Tejas Belagod gcc/ * reload.c (find_reloads_toplev): Include the subreg byte in the address of memrefs when converting subregs of mems into narrower memrefs.diff --git a/gcc/reload.c b/gcc/reload.c index e42cc5c..b6d4ce9 100644 --- a/gcc/reload.c +++ b/gcc/reload.c @@ -4771,15 +4771,27 @@ find_reloads_toplev (rtx x, int opnum, enum reload_type type, #ifdef LOAD_EXTEND_OP && !paradoxical_subreg_p (x) #endif - && (reg_equiv_address (regno) != 0 - || (reg_equiv_mem (regno) != 0 - && (! strict_memory_address_addr_space_p - (GET_MODE (x), XEXP (reg_equiv_mem (regno), 0), - MEM_ADDR_SPACE (reg_equiv_mem (regno))) - || ! offsettable_memref_p (reg_equiv_mem (regno)) - || num_not_at_initial_offset - x = find_reloads_subreg_address (x, 1, opnum, type, ind_levels, - insn, address_reloaded); +) + { + if (reg_equiv_address (regno) != 0) + x = find_reloads_subreg_address (x, 1, opnum, type, ind_levels, +insn, address_reloaded); + else if (reg_equiv_mem (regno) != 0) + { + tem = + simplify_gen_subreg (GET_MODE (x), reg_equiv_mem (regno), +