Re: rl78 vs cse vs memory_address_addr_space

2015-07-07 Thread Segher Boessenkool
On Mon, Jul 06, 2015 at 04:58:36PM -0500, Segher Boessenkool wrote:
 On Mon, Jul 06, 2015 at 04:45:35PM -0400, DJ Delorie wrote:
  Combine gets as far as this:
  
  Trying 5 - 9:
  Failed to match this instruction:
  (parallel [
  (set (mem/v/j:QI (const_int 240 [0xf0]) [0 MEM[(volatile union 
  un_per0 *)240B].BIT.no4+0 S1 A16])
  (ior:QI (mem/v/j:QI (const_int 240 [0xf0]) [0 MEM[(volatile 
  union un_per0 *)240B].BIT.no4+0 S1 A16])
  (const_int 16 [0x10])))
  (set (reg/f:HI 43)
  (const_int 240 [0xf0]))
  ])
  
  (the set is left behind because it's used for the second assignment)
  
  Both of those insns in the parallel are valid rl78 insns.  I tried
  adding that parallel as a define-and-split but combine doesn't split
  it at the point where it inserts it, so it doesn't work right.  If it
  reduced those four instructions to the two in the parallel, but
  without the parallel, it would probably work too.
 
 Did you try just a define_split instead?  Ugly, but it should work I think.

So I built a rl78-elf cross and tried the example (s/union __BITS9/__BITS8/).
I couldn't get your exact code, I guess I need some special options?
But before combine it looks the same.

There is no combination of four instructions, the only combinations
combine makes here are 2-1 combinations, and they all succeed, except
that last one.  Since combine won't do 2-2 no split will work.  And,
as far as combine is concerned, a define_insn_and_split is a define_insn,
so that indeed won't help you either.

What you need is some form of uncse, like the attached hack (which doesn't
work in this case because the rtx costs prohibit it).


Segher


diff --git a/gcc/combine.c b/gcc/combine.c
index b97aa10..20d526f 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -3928,8 +3938,10 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
XVECLEN (newpat, 0) == 2
GET_CODE (XVECEXP (newpat, 0, 0)) == SET
GET_CODE (XVECEXP (newpat, 0, 1)) == SET
-   (i1 || set_noop_p (XVECEXP (newpat, 0, 0))
- || set_noop_p (XVECEXP (newpat, 0, 1)))
+   (1
+  || i1
+  || set_noop_p (XVECEXP (newpat, 0, 0))
+  || set_noop_p (XVECEXP (newpat, 0, 1)))
GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != ZERO_EXTRACT
GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != STRICT_LOW_PART
GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != ZERO_EXTRACT


Re: rl78 vs cse vs memory_address_addr_space

2015-07-06 Thread Jeff Law

On 07/01/2015 10:14 PM, DJ Delorie wrote:

In this bit of code in explow.c:

   /* By passing constant addresses through registers
  we get a chance to cse them.  */
   if (! cse_not_expected  CONSTANT_P (x)  CONSTANT_ADDRESS_P (x))
 x = force_reg (address_mode, x);

On the rl78 it results in code that's a bit too complex for later
passes to be optimized fully.  Is there any way to indicate that the
above force_reg() is bad for a particular target?
I believe this used to be conditional on -fforce-mem or -fforce-reg or 
some such option that we deprecated long ago.


It'd be helpful if you could be more specific about what can't be 
handled.  combine for example was extended to handle larger chains of 
insns not terribly long ago.


jeff



Re: rl78 vs cse vs memory_address_addr_space

2015-07-06 Thread DJ Delorie

Given a test case like this:

typedef struct {
   unsigned char no0 :1;
   unsigned char no1 :1;
   unsigned char no2 :1;
   unsigned char no3 :1;
   unsigned char no4 :1;
   unsigned char no5 :1;
   unsigned char no6 :1;
   unsigned char no7 :1;
} __BITS8;
#define SFR0_bit (*(volatile union __BITS9 *)0x0)
#define SFREN SFR0_bit.no4

foo() {
   SFREN = 1U;
   SFREN = 0U;
}


(i.e. any code that sets/clears one bit in a volatile memory-mapped
area, which the rl78 has instructions for)

Before:

(insn 5 2 7 2 (set (reg/f:HI 43)
(const_int 240 [0xf0])) test.c:24 7 {*movhi_virt}
 (nil))
(insn 7 5 8 2 (set (reg:QI 45 [ MEM[(volatile union un_per0 *)240B].BIT.no4 ])
(mem/v/j:QI (reg/f:HI 43) [0 MEM[(volatile union un_per0 
*)240B].BIT.no4+0 S1 A16])) test.c:24 5 {movqi_virt}
 (nil))
(insn 8 7 9 2 (set (reg:QI 46)
(ior:QI (reg:QI 45 [ MEM[(volatile union un_per0 *)240B].BIT.no4 ])
(const_int 16 [0x10]))) test.c:24 19 {*iorqi3_virt}
 (expr_list:REG_DEAD (reg:QI 45 [ MEM[(volatile union un_per0 
*)240B].BIT.no4 ])
(nil)))
(insn 9 8 12 2 (set (mem/v/j:QI (reg/f:HI 43) [0 MEM[(volatile union un_per0 
*)240B].BIT.no4+0 S1 A16])
(reg:QI 46)) test.c:24 5 {movqi_virt}
 (expr_list:REG_DEAD (reg:QI 46)
(nil)))
(insn 12 9 13 2 (set (reg:QI 49 [ MEM[(volatile union un_per0 *)240B].BIT.no4 ])
(mem/v/j:QI (reg/f:HI 43) [0 MEM[(volatile union un_per0 
*)240B].BIT.no4+0 S1 A16])) test.c:26 5 {movqi_virt}
 (nil))
(insn 13 12 14 2 (set (reg:QI 50)
(and:QI (reg:QI 49 [ MEM[(volatile union un_per0 *)240B].BIT.no4 ])
(const_int -17 [0xffef]))) test.c:26 18 {*andqi3_virt}
 (expr_list:REG_DEAD (reg:QI 49 [ MEM[(volatile union un_per0 
*)240B].BIT.no4 ])
(nil)))
(insn 14 13 0 2 (set (mem/v/j:QI (reg/f:HI 43) [0 MEM[(volatile union un_per0 
*)240B].BIT.no4+0 S1 A16])
(reg:QI 50)) test.c:26 5 {movqi_virt}
 (expr_list:REG_DEAD (reg:QI 50)
(expr_list:REG_DEAD (reg/f:HI 43)
(nil

Combine gets as far as this:

Trying 5 - 9:
Failed to match this instruction:
(parallel [
(set (mem/v/j:QI (const_int 240 [0xf0]) [0 MEM[(volatile union un_per0 
*)240B].BIT.no4+0 S1 A16])
(ior:QI (mem/v/j:QI (const_int 240 [0xf0]) [0 MEM[(volatile union 
un_per0 *)240B].BIT.no4+0 S1 A16])
(const_int 16 [0x10])))
(set (reg/f:HI 43)
(const_int 240 [0xf0]))
])

(the set is left behind because it's used for the second assignment)

Both of those insns in the parallel are valid rl78 insns.  I tried
adding that parallel as a define-and-split but combine doesn't split
it at the point where it inserts it, so it doesn't work right.  If it
reduced those four instructions to the two in the parallel, but
without the parallel, it would probably work too.

We end up with code like this:

movwr8, #240 ; 5*movhi_real/4   [length 
= 4]
movwax, r8   ; 19   *movhi_real/5   [length 
= 4]
movwhl, ax   ; 21   *movhi_real/6   [length 
= 4]
set1[hl].4   ; 9*iorqi3_real/1  [length 
= 4]
clr1[hl].4   ; 14   *andqi3_real/1  [length 
= 4]

but what we want is this:

set1!240.4   ; 9*iorqi3_real/1  [length 
= 4]
clr1!240.4   ; 14   *andqi3_real/1  [length 
= 4]

( !240 means (mem (const_int 240)) )

(if there's only one such operation in a function, it combines
properly, likely because the address is not needed after the insn it
can combine, unlike the parallel above)

The common addresses are separated at least before lowering to RTL; as the 
initial
expansion has:

;; MEM[(volatile union un_per0 *)240B].BIT.no4 ={v} 1;

(insn 5 4 7 (set (reg/f:HI 43)
(const_int 240 [0xf0])) test.c:24 -1
 (nil))

(insn 7 5 8 (set (reg:QI 45)
(mem/v/j:QI (reg/f:HI 43) [0 MEM[(volatile union un_per0 
*)240B].BIT.no4+0 S1 A16])) test.c:24 -1
 (nil))

(insn 8 7 9 (set (reg:QI 46)
(ior:QI (reg:QI 45)
(const_int 16 [0x10]))) test.c:24 -1
 (nil))

(insn 9 8 0 (set (mem/v/j:QI (reg/f:HI 43) [0 MEM[(volatile union un_per0 
*)240B].BIT.no4+0 S1 A16])
(reg:QI 46)) test.c:24 -1
 (nil))


Yes, I know gcc doesn't like combining volatile accesses into one
insn, but the rl78 backend (my copy at least) has predicates that
allow it, because it's safe on rl78.

Also, if I take out the volatile yet put some sort of barrier (like
a volatile asm) between the two assignments, it still fails, in the
same manner.


Re: rl78 vs cse vs memory_address_addr_space

2015-07-06 Thread Segher Boessenkool
On Mon, Jul 06, 2015 at 04:45:35PM -0400, DJ Delorie wrote:
 Combine gets as far as this:
 
 Trying 5 - 9:
 Failed to match this instruction:
 (parallel [
 (set (mem/v/j:QI (const_int 240 [0xf0]) [0 MEM[(volatile union 
 un_per0 *)240B].BIT.no4+0 S1 A16])
 (ior:QI (mem/v/j:QI (const_int 240 [0xf0]) [0 MEM[(volatile union 
 un_per0 *)240B].BIT.no4+0 S1 A16])
 (const_int 16 [0x10])))
 (set (reg/f:HI 43)
 (const_int 240 [0xf0]))
 ])
 
 (the set is left behind because it's used for the second assignment)
 
 Both of those insns in the parallel are valid rl78 insns.  I tried
 adding that parallel as a define-and-split but combine doesn't split
 it at the point where it inserts it, so it doesn't work right.  If it
 reduced those four instructions to the two in the parallel, but
 without the parallel, it would probably work too.

Did you try just a define_split instead?  Ugly, but it should work I think.


Segher


Re: rl78 vs cse vs memory_address_addr_space

2015-07-06 Thread DJ Delorie

 Did you try just a define_split instead?  Ugly, but it should work I think.

It doesn't seem to be able to match a define_split :-(


rl78 vs cse vs memory_address_addr_space

2015-07-01 Thread DJ Delorie

In this bit of code in explow.c:

  /* By passing constant addresses through registers
 we get a chance to cse them.  */
  if (! cse_not_expected  CONSTANT_P (x)  CONSTANT_ADDRESS_P (x))
x = force_reg (address_mode, x);

On the rl78 it results in code that's a bit too complex for later
passes to be optimized fully.  Is there any way to indicate that the
above force_reg() is bad for a particular target?