Re: rl78 vs cse vs memory_address_addr_space
On Mon, Jul 06, 2015 at 04:58:36PM -0500, Segher Boessenkool wrote: On Mon, Jul 06, 2015 at 04:45:35PM -0400, DJ Delorie wrote: Combine gets as far as this: Trying 5 - 9: Failed to match this instruction: (parallel [ (set (mem/v/j:QI (const_int 240 [0xf0]) [0 MEM[(volatile union un_per0 *)240B].BIT.no4+0 S1 A16]) (ior:QI (mem/v/j:QI (const_int 240 [0xf0]) [0 MEM[(volatile union un_per0 *)240B].BIT.no4+0 S1 A16]) (const_int 16 [0x10]))) (set (reg/f:HI 43) (const_int 240 [0xf0])) ]) (the set is left behind because it's used for the second assignment) Both of those insns in the parallel are valid rl78 insns. I tried adding that parallel as a define-and-split but combine doesn't split it at the point where it inserts it, so it doesn't work right. If it reduced those four instructions to the two in the parallel, but without the parallel, it would probably work too. Did you try just a define_split instead? Ugly, but it should work I think. So I built a rl78-elf cross and tried the example (s/union __BITS9/__BITS8/). I couldn't get your exact code, I guess I need some special options? But before combine it looks the same. There is no combination of four instructions, the only combinations combine makes here are 2-1 combinations, and they all succeed, except that last one. Since combine won't do 2-2 no split will work. And, as far as combine is concerned, a define_insn_and_split is a define_insn, so that indeed won't help you either. What you need is some form of uncse, like the attached hack (which doesn't work in this case because the rtx costs prohibit it). Segher diff --git a/gcc/combine.c b/gcc/combine.c index b97aa10..20d526f 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -3928,8 +3938,10 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, XVECLEN (newpat, 0) == 2 GET_CODE (XVECEXP (newpat, 0, 0)) == SET GET_CODE (XVECEXP (newpat, 0, 1)) == SET - (i1 || set_noop_p (XVECEXP (newpat, 0, 0)) - || set_noop_p (XVECEXP (newpat, 0, 1))) + (1 + || i1 + || set_noop_p (XVECEXP (newpat, 0, 0)) + || set_noop_p (XVECEXP (newpat, 0, 1))) GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != ZERO_EXTRACT GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != STRICT_LOW_PART GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != ZERO_EXTRACT
Re: rl78 vs cse vs memory_address_addr_space
On 07/01/2015 10:14 PM, DJ Delorie wrote: In this bit of code in explow.c: /* By passing constant addresses through registers we get a chance to cse them. */ if (! cse_not_expected CONSTANT_P (x) CONSTANT_ADDRESS_P (x)) x = force_reg (address_mode, x); On the rl78 it results in code that's a bit too complex for later passes to be optimized fully. Is there any way to indicate that the above force_reg() is bad for a particular target? I believe this used to be conditional on -fforce-mem or -fforce-reg or some such option that we deprecated long ago. It'd be helpful if you could be more specific about what can't be handled. combine for example was extended to handle larger chains of insns not terribly long ago. jeff
Re: rl78 vs cse vs memory_address_addr_space
Given a test case like this: typedef struct { unsigned char no0 :1; unsigned char no1 :1; unsigned char no2 :1; unsigned char no3 :1; unsigned char no4 :1; unsigned char no5 :1; unsigned char no6 :1; unsigned char no7 :1; } __BITS8; #define SFR0_bit (*(volatile union __BITS9 *)0x0) #define SFREN SFR0_bit.no4 foo() { SFREN = 1U; SFREN = 0U; } (i.e. any code that sets/clears one bit in a volatile memory-mapped area, which the rl78 has instructions for) Before: (insn 5 2 7 2 (set (reg/f:HI 43) (const_int 240 [0xf0])) test.c:24 7 {*movhi_virt} (nil)) (insn 7 5 8 2 (set (reg:QI 45 [ MEM[(volatile union un_per0 *)240B].BIT.no4 ]) (mem/v/j:QI (reg/f:HI 43) [0 MEM[(volatile union un_per0 *)240B].BIT.no4+0 S1 A16])) test.c:24 5 {movqi_virt} (nil)) (insn 8 7 9 2 (set (reg:QI 46) (ior:QI (reg:QI 45 [ MEM[(volatile union un_per0 *)240B].BIT.no4 ]) (const_int 16 [0x10]))) test.c:24 19 {*iorqi3_virt} (expr_list:REG_DEAD (reg:QI 45 [ MEM[(volatile union un_per0 *)240B].BIT.no4 ]) (nil))) (insn 9 8 12 2 (set (mem/v/j:QI (reg/f:HI 43) [0 MEM[(volatile union un_per0 *)240B].BIT.no4+0 S1 A16]) (reg:QI 46)) test.c:24 5 {movqi_virt} (expr_list:REG_DEAD (reg:QI 46) (nil))) (insn 12 9 13 2 (set (reg:QI 49 [ MEM[(volatile union un_per0 *)240B].BIT.no4 ]) (mem/v/j:QI (reg/f:HI 43) [0 MEM[(volatile union un_per0 *)240B].BIT.no4+0 S1 A16])) test.c:26 5 {movqi_virt} (nil)) (insn 13 12 14 2 (set (reg:QI 50) (and:QI (reg:QI 49 [ MEM[(volatile union un_per0 *)240B].BIT.no4 ]) (const_int -17 [0xffef]))) test.c:26 18 {*andqi3_virt} (expr_list:REG_DEAD (reg:QI 49 [ MEM[(volatile union un_per0 *)240B].BIT.no4 ]) (nil))) (insn 14 13 0 2 (set (mem/v/j:QI (reg/f:HI 43) [0 MEM[(volatile union un_per0 *)240B].BIT.no4+0 S1 A16]) (reg:QI 50)) test.c:26 5 {movqi_virt} (expr_list:REG_DEAD (reg:QI 50) (expr_list:REG_DEAD (reg/f:HI 43) (nil Combine gets as far as this: Trying 5 - 9: Failed to match this instruction: (parallel [ (set (mem/v/j:QI (const_int 240 [0xf0]) [0 MEM[(volatile union un_per0 *)240B].BIT.no4+0 S1 A16]) (ior:QI (mem/v/j:QI (const_int 240 [0xf0]) [0 MEM[(volatile union un_per0 *)240B].BIT.no4+0 S1 A16]) (const_int 16 [0x10]))) (set (reg/f:HI 43) (const_int 240 [0xf0])) ]) (the set is left behind because it's used for the second assignment) Both of those insns in the parallel are valid rl78 insns. I tried adding that parallel as a define-and-split but combine doesn't split it at the point where it inserts it, so it doesn't work right. If it reduced those four instructions to the two in the parallel, but without the parallel, it would probably work too. We end up with code like this: movwr8, #240 ; 5*movhi_real/4 [length = 4] movwax, r8 ; 19 *movhi_real/5 [length = 4] movwhl, ax ; 21 *movhi_real/6 [length = 4] set1[hl].4 ; 9*iorqi3_real/1 [length = 4] clr1[hl].4 ; 14 *andqi3_real/1 [length = 4] but what we want is this: set1!240.4 ; 9*iorqi3_real/1 [length = 4] clr1!240.4 ; 14 *andqi3_real/1 [length = 4] ( !240 means (mem (const_int 240)) ) (if there's only one such operation in a function, it combines properly, likely because the address is not needed after the insn it can combine, unlike the parallel above) The common addresses are separated at least before lowering to RTL; as the initial expansion has: ;; MEM[(volatile union un_per0 *)240B].BIT.no4 ={v} 1; (insn 5 4 7 (set (reg/f:HI 43) (const_int 240 [0xf0])) test.c:24 -1 (nil)) (insn 7 5 8 (set (reg:QI 45) (mem/v/j:QI (reg/f:HI 43) [0 MEM[(volatile union un_per0 *)240B].BIT.no4+0 S1 A16])) test.c:24 -1 (nil)) (insn 8 7 9 (set (reg:QI 46) (ior:QI (reg:QI 45) (const_int 16 [0x10]))) test.c:24 -1 (nil)) (insn 9 8 0 (set (mem/v/j:QI (reg/f:HI 43) [0 MEM[(volatile union un_per0 *)240B].BIT.no4+0 S1 A16]) (reg:QI 46)) test.c:24 -1 (nil)) Yes, I know gcc doesn't like combining volatile accesses into one insn, but the rl78 backend (my copy at least) has predicates that allow it, because it's safe on rl78. Also, if I take out the volatile yet put some sort of barrier (like a volatile asm) between the two assignments, it still fails, in the same manner.
Re: rl78 vs cse vs memory_address_addr_space
On Mon, Jul 06, 2015 at 04:45:35PM -0400, DJ Delorie wrote: Combine gets as far as this: Trying 5 - 9: Failed to match this instruction: (parallel [ (set (mem/v/j:QI (const_int 240 [0xf0]) [0 MEM[(volatile union un_per0 *)240B].BIT.no4+0 S1 A16]) (ior:QI (mem/v/j:QI (const_int 240 [0xf0]) [0 MEM[(volatile union un_per0 *)240B].BIT.no4+0 S1 A16]) (const_int 16 [0x10]))) (set (reg/f:HI 43) (const_int 240 [0xf0])) ]) (the set is left behind because it's used for the second assignment) Both of those insns in the parallel are valid rl78 insns. I tried adding that parallel as a define-and-split but combine doesn't split it at the point where it inserts it, so it doesn't work right. If it reduced those four instructions to the two in the parallel, but without the parallel, it would probably work too. Did you try just a define_split instead? Ugly, but it should work I think. Segher
Re: rl78 vs cse vs memory_address_addr_space
Did you try just a define_split instead? Ugly, but it should work I think. It doesn't seem to be able to match a define_split :-(
rl78 vs cse vs memory_address_addr_space
In this bit of code in explow.c: /* By passing constant addresses through registers we get a chance to cse them. */ if (! cse_not_expected CONSTANT_P (x) CONSTANT_ADDRESS_P (x)) x = force_reg (address_mode, x); On the rl78 it results in code that's a bit too complex for later passes to be optimized fully. Is there any way to indicate that the above force_reg() is bad for a particular target?