Re: PR53914, rs6000 constraints and reload queries
On Wed, Aug 1, 2012 at 9:52 AM, Alan Modra amo...@gmail.com wrote: Hi David, This is the last of my vendetta against the o constraint in the rs6000 backend. Remaining are a few places in rs6000.md where the operand predicate is offsettable_mem_operand, and one instance in spe.md. I believe none of these can run into mode_dependent_address causing insn does not statisfy its constraints. Also, with Y replacing o everywhere a 64-bit gpr is loaded or stored, we can remove the +0 wrapper you added for pr13674. I also add some testcases with this patch. dfmode_off and tfmode_off will fail on e500, but the generated code at -O2 is so awful that I'm inclined to think it should remain that way until someone fixes it! Bootstrapped and regression tested powerpc64-linux. OK to apply? gcc/ * config/rs6000/rs6000.c (legitimize_reload_address): Remove code handling non-aligned ld/std. * config/rs6000/paired.md (movv2sf_paired): Use 'Y' instead of 'o'. * config/rs6000/vsx.md (vsx_mov, vsx_movti): Likewise. * config/rs6000/altivec.md (altivec_mov, altivec_movti): Likewise. * config/rs6000/dfp.md (movtd_internal): Use 'm' instead of 'o'. gcc/testsuite/ * gcc.target/powerpc/dimode_off.c: New. * gcc.target/powerpc/timode_off.c: New. * gcc.target/powerpc/dfmode_off.c: New. * gcc.target/powerpc/tfmode_off.c: New. This is okay. Thanks, David
Re: PR53914, rs6000 constraints and reload queries
Hi David, This is the last of my vendetta against the o constraint in the rs6000 backend. Remaining are a few places in rs6000.md where the operand predicate is offsettable_mem_operand, and one instance in spe.md. I believe none of these can run into mode_dependent_address causing insn does not statisfy its constraints. Also, with Y replacing o everywhere a 64-bit gpr is loaded or stored, we can remove the +0 wrapper you added for pr13674. I also add some testcases with this patch. dfmode_off and tfmode_off will fail on e500, but the generated code at -O2 is so awful that I'm inclined to think it should remain that way until someone fixes it! Bootstrapped and regression tested powerpc64-linux. OK to apply? gcc/ * config/rs6000/rs6000.c (legitimize_reload_address): Remove code handling non-aligned ld/std. * config/rs6000/paired.md (movv2sf_paired): Use 'Y' instead of 'o'. * config/rs6000/vsx.md (vsx_mov, vsx_movti): Likewise. * config/rs6000/altivec.md (altivec_mov, altivec_movti): Likewise. * config/rs6000/dfp.md (movtd_internal): Use 'm' instead of 'o'. gcc/testsuite/ * gcc.target/powerpc/dimode_off.c: New. * gcc.target/powerpc/timode_off.c: New. * gcc.target/powerpc/dfmode_off.c: New. * gcc.target/powerpc/tfmode_off.c: New. Index: gcc/testsuite/gcc.target/powerpc/dimode_off.c === --- gcc/testsuite/gcc.target/powerpc/dimode_off.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/dimode_off.c (revision 0) @@ -0,0 +1,50 @@ +/* { dg-do assemble } */ +/* { dg-options -O2 -fno-align-functions -mtraceback=no -save-temps } */ + +void w1 (void *x, long long y) { *(long long *) (x + 32767) = y; } +void w2 (void *x, long long y) { *(long long *) (x + 32766) = y; } +void w3 (void *x, long long y) { *(long long *) (x + 32765) = y; } +void w4 (void *x, long long y) { *(long long *) (x + 32764) = y; } +void w5 (void *x, long long y) { *(long long *) (x + 32763) = y; } +void w6 (void *x, long long y) { *(long long *) (x + 32762) = y; } +void w7 (void *x, long long y) { *(long long *) (x + 32761) = y; } +void w8 (void *x, long long y) { *(long long *) (x + 32760) = y; } +void w9 (void *x, long long y) { *(long long *) (x + 32759) = y; } +void w10 (void *x, long long y) { *(long long *) (x + 32758) = y; } +void w11 (void *x, long long y) { *(long long *) (x + 32757) = y; } +void w12 (void *x, long long y) { *(long long *) (x + 32756) = y; } +void w13 (void *x, long long y) { *(long long *) (x + 32755) = y; } +void w14 (void *x, long long y) { *(long long *) (x + 32754) = y; } +void w15 (void *x, long long y) { *(long long *) (x + 32753) = y; } +void w16 (void *x, long long y) { *(long long *) (x + 32752) = y; } +void w17 (void *x, long long y) { *(long long *) (x + 32751) = y; } +void w18 (void *x, long long y) { *(long long *) (x + 32750) = y; } +void w19 (void *x, long long y) { *(long long *) (x + 32749) = y; } +void w20 (void *x, long long y) { *(long long *) (x + 32748) = y; } + +long long r1 (void *x) { return *(long long *) (x + 32767); } +long long r2 (void *x) { return *(long long *) (x + 32766); } +long long r3 (void *x) { return *(long long *) (x + 32765); } +long long r4 (void *x) { return *(long long *) (x + 32764); } +long long r5 (void *x) { return *(long long *) (x + 32763); } +long long r6 (void *x) { return *(long long *) (x + 32762); } +long long r7 (void *x) { return *(long long *) (x + 32761); } +long long r8 (void *x) { return *(long long *) (x + 32760); } +long long r9 (void *x) { return *(long long *) (x + 32759); } +long long r10 (void *x) { return *(long long *) (x + 32758); } +long long r11 (void *x) { return *(long long *) (x + 32757); } +long long r12 (void *x) { return *(long long *) (x + 32756); } +long long r13 (void *x) { return *(long long *) (x + 32755); } +long long r14 (void *x) { return *(long long *) (x + 32754); } +long long r15 (void *x) { return *(long long *) (x + 32753); } +long long r16 (void *x) { return *(long long *) (x + 32752); } +long long r17 (void *x) { return *(long long *) (x + 32751); } +long long r18 (void *x) { return *(long long *) (x + 32750); } +long long r19 (void *x) { return *(long long *) (x + 32749); } +long long r20 (void *x) { return *(long long *) (x + 32748); } + +/* { dg-final { object-size text == 440 { target { lp64 } } } } */ +/* 32-bit test should really be == 512 bytes, see pr54110 */ +/* { dg-final { object-size text = 640 { target { ilp32 } } } } */ +/* { dg-final { scan-assembler-not (st|l)fd } } */ +/* { dg-final { cleanup-saved-temps dimode_off } } */ Index: gcc/testsuite/gcc.target/powerpc/timode_off.c === --- gcc/testsuite/gcc.target/powerpc/timode_off.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/timode_off.c (revision 0) @@ -0,0 +1,51 @@ +/* { dg-do assemble { target { lp64 } } } */ +/* {
Re: PR53914, rs6000 constraints and reload queries
On Mon, Jul 23, 2012 at 07:30:23PM -0400, David Edelsohn wrote: This is okay Committed revision 189801. PR target/53914 PR target/54009 * config/rs6000/constraints.md (Y): Use mem_operand_gpr. * config/rs6000/predicates.md (word_offset_memref_operand): Delete. Adjust all rs6000_legitimate_offset_address_p calls. * config/rs6000/rs6000-protos.h (mem_operand_gpr): Declare. (rs6000_secondary_reload_gpr): Declare. (rs6000_legitimate_offset_address_p): Update prototype. (rs6000_offsettable_memref_p): Delete. (rs6000_secondary_reload_ppc64): Delete. * config/rs6000/rs6000.c (address_offset): New function. (mem_operand_gpr): Likewise. (rs6000_legitimate_offset_address_p): Add worst_case param. When not worst_case assume class of regs with least restrictive offsets. Adjust all calls. (legitimate_lo_sum_address_p): Simplify register mode tests. (rs6000_legitimize_address): Likewise. Assume best case offset addressing. Combine ELF and MACHO lo_sum code. (rs6000_mode_dependent_address): Correct offset addressing limits. (rs6000_offsettable_memref_p): Make static, add reg_mode param. Use reg_mode to help rs6000_legitimate_offset_address_p. (rs6000_secondary_reload): Use address_offset. Handle 32-bit multi gpr load/store when offset too large. (rs6000_secondary_reload_gpr): Renamed rs6000_secondary_reload_ppc64. (rs6000_split_multireg_move): Adjust rs6000_offsettable_memref_p calls. * config/rs6000/rs6000.md (movdf_hardfloat32): Use 'Y' constraint for gpr load/store. Order alternatives as r-Y,Y-r,r-r and d-m,m-d,d-d. Correct size of gpr load/store. (movdf_softfloat32): Use 'Y' constraint for gpr load/store. Order alternatives. (movti_ppc64): Likewise. (movdi_internal32): Likewise. Also disparage fprs. (movdi_mfpgpr, movdi_internal64): Likewise. (movtf_internal): Use 'm' for fpr load/store. Order alternatives. (movtf_softfloat): Order alternatives. (extenddftf2_internal): Use 'm' and 'Y' for store. (movti_power, movti_string): Use 'Y' for gpr load/store. Order. (stack_protect_setdi, stack_protect_testdi): Likewise. (movdf_hardfloat64_mfpgpr, movdf_hardfloat64): Order alternatives. (movdf_softfloat64): Likewise. (reload_mode_store): Adjust reload_di_store to provide reload_si_store as well. (reload_mode_load): Likewise. Index: gcc/config/rs6000/constraints.md === --- gcc/config/rs6000/constraints.md(revision 189800) +++ gcc/config/rs6000/constraints.md(working copy) @@ -150,8 +150,9 @@ to use @samp{m} or @samp{es} in @code{asm} stateme (match_test GET_CODE (XEXP (op, 0)) == REG))) (define_memory_constraint Y - Indexed or word-aligned displacement memory operand - (match_operand 0 word_offset_memref_operand)) + memory operand for 8 byte and 16 byte gpr load/store + (and (match_code mem) + (match_operand 0 mem_operand_gpr))) (define_memory_constraint Z Memory operand that is an indexed or indirect from a register (it is Index: gcc/config/rs6000/predicates.md === --- gcc/config/rs6000/predicates.md (revision 189800) +++ gcc/config/rs6000/predicates.md (working copy) @@ -432,29 +432,6 @@ (and (match_operand 0 memory_operand) (match_test offsettable_nonstrict_memref_p (op -;; Return 1 if the operand is a memory operand with an address divisible by 4 -(define_predicate word_offset_memref_operand - (match_operand 0 memory_operand) -{ - /* Address inside MEM. */ - op = XEXP (op, 0); - - /* Extract address from auto-inc/dec. */ - if (GET_CODE (op) == PRE_INC - || GET_CODE (op) == PRE_DEC) -op = XEXP (op, 0); - else if (GET_CODE (op) == PRE_MODIFY) -op = XEXP (op, 1); - else if (GET_CODE (op) == LO_SUM - GET_CODE (XEXP (op, 0)) == REG - GET_CODE (XEXP (op, 1)) == CONST) -op = XEXP (XEXP (op, 1), 0); - - return (GET_CODE (op) != PLUS - || GET_CODE (XEXP (op, 1)) != CONST_INT - || INTVAL (XEXP (op, 1)) % 4 == 0); -}) - ;; Return 1 if the operand is an indexed or indirect memory operand. (define_predicate indexed_or_indirect_operand (match_code mem) @@ -892,7 +869,8 @@ return input_operand (op, mode); }) -;; Return true if OP is an invalid SUBREG operation on the e500. +;; Return true if OP is a non-immediate operand and not an invalid +;; SUBREG operation on the e500. (define_predicate rs6000_nonimmediate_operand (match_code reg,subreg,mem) { @@ -1325,7 +1303,7 @@ if (base_regno == 0) return 0; } - else if (rs6000_legitimate_offset_address_p (SImode, src_addr, 0)) + else if
Re: PR53914, rs6000 constraints and reload queries
On Thu, Jul 19, 2012 at 8:45 PM, Alan Modra amo...@gmail.com wrote: This on the other hand works. Please consider the patch amended to remove mem_operand_fpr, the 'wY' constraint, and uses thereof replaced with m. What, exactly, is the current proposed patch? Remove offsettable address constraint from movXX patterns but not from LO_SUM, and not adding wY constraint? Thanks, David
Re: PR53914, rs6000 constraints and reload queries
On Thu, Jul 19, 2012 at 12:34:25PM +0930, Alan Modra wrote: and fixes pr54009. David, in looking over this today, I realised that this bug isn't completely fixed. I stopped gcc emitting an offset of 32768, but that isn't enough. lo_sum addresses can't be offset, except when you know the alignment of the object being addressed guarantees that no part crosses a 32k boundary. For example, given lis 9,x@ha; lwz 3,x@l(9); lwz 4,x+4@l(9); we run into trouble if x happens to reside at n*64k + 32764. The final address matters, not just the offset in the insn. So I have some changes to mem_operand_gpr and rs6000_secondary_reload. I'll post a patch on top of the previous one. Another thing. I reckon we can do without the 'wY' constraint. I implemented it because movtf_internal currently uses an o constraint, but it seems to me that rs6000_legitimate_address_p already prevents non-offsettable TFmode addresses for mems accessed by fprs. Geoff introduced the o on movtf here: http://gcc.gnu.org/ml/gcc-patches/2003-12/msg00803.html Thoughts? -- Alan Modra Australia Development Lab, IBM
Re: PR53914, rs6000 constraints and reload queries
On Fri, Jul 20, 2012 at 12:05:28AM +0930, Alan Modra wrote: On Thu, Jul 19, 2012 at 12:34:25PM +0930, Alan Modra wrote: and fixes pr54009. David, in looking over this today, I realised that this bug isn't completely fixed. I stopped gcc emitting an offset of 32768, but that isn't enough. lo_sum addresses can't be offset, except when you know the alignment of the object being addressed guarantees that no part crosses a 32k boundary. For example, given lis 9,x@ha; lwz 3,x@l(9); lwz 4,x+4@l(9); we run into trouble if x happens to reside at n*64k + 32764. The final address matters, not just the offset in the insn. So I have some changes to mem_operand_gpr and rs6000_secondary_reload. I'll post a patch on top of the previous one. Actually, maybe I won't. When I disabled offsetting LO_SUMs, except for tocrefs (which we know are aligned), I ran into an insn does not satisfy its constraints ICE on gcc.dg/pr28795-2. The cause: code in rs6000_legitimze_reload_address near the comment starting /* Don't do this for TFmode. There we have a list of modes from which we make assumptions about the regs used, and get it wrong for ppc32. Reload decided to use 2 gprs for a DFmode. We've been doing this forever, and almost always the mem is aligned. So fixing an esoteric testcase like the one in pr54009 would come at the expense of normal code quality. Unless I can reliably figure out regs used and alignment of syms in rs6000_legitimize_reload_address, and as far as I can tell that isn't possible. Another thing. I reckon we can do without the 'wY' constraint. I implemented it because movtf_internal currently uses an o constraint, but it seems to me that rs6000_legitimate_address_p already prevents non-offsettable TFmode addresses for mems accessed by fprs. Geoff introduced the o on movtf here: http://gcc.gnu.org/ml/gcc-patches/2003-12/msg00803.html Thoughts? This on the other hand works. Please consider the patch amended to remove mem_operand_fpr, the 'wY' constraint, and uses thereof replaced with m. -- Alan Modra Australia Development Lab, IBM
Re: PR53914, rs6000 constraints and reload queries
Thanks very much Uli for verifying my conclusions about reload, operand predicates and constraints, and particularly the general unusability of the o constraint. Re http://gcc.gnu.org/ml/gcc/2012-07/msg00142.html, this patch adds the missing secondary reload patterns, corrects constraints I got wrong (?*d, not *?d), and fixes pr54009. Uli said: An address involving pseudos should be considered legitimate if there exists an assignment of hard registers that makes it strictly legitimate (not if *any* such assignment would be strictly legitimate). [ It might make sense in some cases to make the check stricter; for example if we know that an address would nearly always require a reload, we might choose to completely reject it if that actually increases performance. But that would be just performance tuning, not required for correctness ... ] So there is quite a bit more work in rs6000.c to fully implement this. See ??? comments that I added on code handling lo_sum, and I'll admit to not even trying to relax rs6000_legitimate_offset_address_p conditions for e500. That can wait for another day. The patch is large enough already. Some notes: - word_offset_memref_operand isn't used as a predicate and as both Uli and I noted, constraints calling predicates lead to trouble with reload_legitimize_address output. So move it out of predicates.md to rs6000.c (renamed as mem_operand_gpr and without checks more suited to predicates). - where I changed a bunch of mode tests to GET_MODE_SIZE checks, the original mode list missing TImode is irrelevant for 32-bit, since TImode isn't supported on 32-bit (Why do we have 32-bit TImode insns?) - reordering insn alternatives in some cases is cosmetic. As the comments say, putting r-Y and Y-r before r-r is necessary, but reordering d-m,m-d,d-d isn't strictly necessary. I did that for consistency, and future proofing should the m constraint need to be changed. Putting r-Y before Y-r is also cosmetic but I prefer it that way for insns that land in reload as pseudo-pseudo ie. mem-mem, where both load and store alternatives match with reloading. I think it's nicer to choose input reloads rather than output reloads, so put the store first. - I haven't actually seen the 32-bit gpr secondary reload patterns trigger (it's hard to make a testcase), so that code is largely untested. Fortunately the code is very similar to the 64-bit gpr secondary reload code. - movdf_hardfloat32 insn lengths looked wrong to me, so I fixed that. gpr load and store ought to be just two insns, not four. I also took out the ?? kludge since the offsettable address problem is now fixed. - I don't really like disparaging fprs in a number of DImode insns, but without that reload prefers to reload inputs. So you get code like stw 10,xxx(1); stw 11,xxx+4(1); lfd 0,xxx(1); stfd 0,32764(9); rather than addi 9,9,32764; stw 10,0(9); stw 11,4(9); The former is slower and requires a stack frame. Bootstrapped and regression tested powerpc-linux. OK to apply? PR target/53914 PR target/54009 * config/rs6000/constraints.md (Y): Use mem_operand_gpr. (wY): New constraint using mem_operand_fpr. * config/rs6000/predicates.md (word_offset_memref_operand): Delete. Adjust all rs6000_legitimate_offset_address_p calls. * config/rs6000/rs6000-protos.h (mem_operand_gpr): Declare. (mem_operand_fpr, rs6000_secondary_reload_gpr): Declare. (rs6000_legitimate_offset_address_p): Update prototype. (rs6000_offsettable_memref_p): Delete. (rs6000_secondary_reload_ppc64): Delete. * config/rs6000/rs6000.c (address_offset): New function. (mem_operand_gpr, mem_operand_fpr): Likewise. (rs6000_legitimate_offset_address_p): Add worst_case param. When not worst_case assume class of regs with least restrictive offsets. Adjust all calls. (legitimate_lo_sum_address_p): Simplify register mode tests. (rs6000_legitimize_address): Likewise. Assume best case offset addressing. Combine ELF and MACHO lo_sum code. (rs6000_mode_dependent_address): Correct offset addressing limits. (rs6000_offsettable_memref_p): Make static, add reg_mode param. Use reg_mode to help rs6000_legitimate_offset_address_p. (rs6000_secondary_reload): Use address_offset. Handle 32-bit multi gpr load/store when offset too large. (rs6000_secondary_reload_gpr): Renamed rs6000_secondary_reload_ppc64. (rs6000_split_multireg_move): Adjust rs6000_offsettable_memref_p calls. * config/rs6000/rs6000.md (movdf_hardfloat32): Use 'Y' constraint for gpr load/store. Order alternatives as r-Y,Y-r,r-r and d-m,m-d,d-d. Correct size of gpr load/store. (movdf_softfloat32): Use 'Y' constraint for gpr load/store. Order alternatives. (movti_ppc64): Likewise. (movdi_internal32): Likewise. Also disparage fprs.