Re: [RFA][rtl-optimization/52714] Do not allow combine to create invalid RTL

2014-02-27 Thread Jeff Law

On 02/17/14 02:28, Eric Botcazou wrote:


Although it's probably time to concede defeat in this particular case, I don't
think we should use the NONJUMP_INSN_P big hammer because we want to eliminate
branches if possible.  So I'd just add the minimal test to the two existing
conditions so as to prevent the invalid RTL from being created, e.g.

(!JUMP_P (i3) || SET_DEST (set[01]) == pc_rtx)

Done.  Attached is the final iteration that got installed.


Verified with a cross compiler.  Bootstrap on m68k will start at some 
point and finish in the next week or so.


jeff
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index a20cee3..16c499b 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2014-02-27  Jeff Law  l...@redhat.com
+
+   PR rtl-optimization/52714
+   * combine.c (try_combine): When splitting an unrecognized PARALLEL
+   into two independent simple sets, if I3 is a jump, ensure the
+   pattern we place into I3 is a (set (pc) ...)
+
 2014-02-27  Mikael Pettersson  mi...@it.uu.se
Jeff Law  l...@redhat.com
 
diff --git a/gcc/combine.c b/gcc/combine.c
index 1b598b4..fc473b6 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -3712,6 +3712,9 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx i0, int 
*new_direct_jump_p,
 #ifdef HAVE_cc0
   !reg_referenced_p (cc0_rtx, set0)
 #endif
+ /* If I3 is a jump, ensure that set0 is a jump so that
+we do not create invalid RTL.  */
+  (!JUMP_P (i3) || SET_DEST (set0) == pc_rtx)
 )
{
  newi2pat = set1;
@@ -3726,6 +3729,9 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx i0, int 
*new_direct_jump_p,
 #ifdef HAVE_cc0
!reg_referenced_p (cc0_rtx, set1)
 #endif
+  /* If I3 is a jump, ensure that set1 is a jump so that
+ we do not create invalid RTL.  */
+   (!JUMP_P (i3) || SET_DEST (set1) == pc_rtx)
  )
{
  newi2pat = set0;
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index be4cb12..b95bcfd 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2014-02-27  Jeff Law  l...@redhat.com
+
+   PR rtl-optimization/52714
+   * gcc.c-torture/compile/pr52714.c: New test.
+
 2014-02-27  Mikael Pettersson  mi...@it.uu.se
 Jeff Law  l...@redhat.com
 
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr52714.c 
b/gcc/testsuite/gcc.c-torture/compile/pr52714.c
new file mode 100644
index 000..03d4990
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr52714.c
@@ -0,0 +1,25 @@
+
+int __re_compile_fastmap(unsigned char *p)
+{
+unsigned char **stack;
+unsigned size;
+unsigned avail;
+
+stack = __builtin_alloca(5 * sizeof(unsigned char*));
+if (stack == 0)
+   return -2;
+size = 5;
+avail = 0;
+
+for (;;) {
+   switch (*p++) {
+   case 0:
+   if (avail == size)
+   return -2;
+   stack[avail++] = p;
+   }
+}
+
+return 0;
+}
+


Re: [RFA][rtl-optimization/52714] Do not allow combine to create invalid RTL

2014-02-24 Thread Jeff Law

On 02/17/14 02:28, Eric Botcazou wrote:

So, the code has this structure

if (looks safe)
emit in existing order
else if (reverse order looks safe)
emit in reversed order
else
undo_all


In this specific case, the existing order is never going to look safe
because set1 uses (sp) as an input argument and use_crosses_set_p is
very conservative when the value is the stack pointer on a PUSH_ROUNDING
machine (such as the m68k)


What a bummer, it seems all things are aligned so as to block the optimization
here...  It is a bit of a shame, as this is a powerful combination that would
eliminate a branch if it succeeded.  The combination of the 3 insns yields the
(set (pc) (pc)) pattern and it's only because added_sets_1 is true that the
annoying PARALLEL is created.
Yea.  As I mentioned before, I actually considered peeking inside the 
vector for a (set (pc) (pc)) and simplifying vector (possibly back down 
to a single set).  I really like eliminating branches ;-0


But I couldn't convince myself it'd be worth the effort.




So we could put the verification code that both I3 and I2 are INSNs in
the else if (reverse order looks safe) clause.That would mean for
this testcase, we ultimately undo_all.  But I consider that reasonable
given the only reason this instance bled into RTL land was -O1 instead
of -O2 compilation.


Although it's probably time to concede defeat in this particular case, I don't
think we should use the NONJUMP_INSN_P big hammer because we want to eliminate
branches if possible.  So I'd just add the minimal test to the two existing
conditions so as to prevent the invalid RTL from being created, e.g.

(!JUMP_P (i3) || SET_DEST (set[01]) == pc_rtx)

OK.  I'll give that a spin.

jeff


Re: [RFA][rtl-optimization/52714] Do not allow combine to create invalid RTL

2014-02-17 Thread Eric Botcazou
 So, the code has this structure
 
 if (looks safe)
emit in existing order
 else if (reverse order looks safe)
emit in reversed order
 else
undo_all
 
 
 In this specific case, the existing order is never going to look safe
 because set1 uses (sp) as an input argument and use_crosses_set_p is
 very conservative when the value is the stack pointer on a PUSH_ROUNDING
 machine (such as the m68k)

What a bummer, it seems all things are aligned so as to block the optimization 
here...  It is a bit of a shame, as this is a powerful combination that would 
eliminate a branch if it succeeded.  The combination of the 3 insns yields the 
(set (pc) (pc)) pattern and it's only because added_sets_1 is true that the 
annoying PARALLEL is created.

 So we could put the verification code that both I3 and I2 are INSNs in
 the else if (reverse order looks safe) clause.That would mean for
 this testcase, we ultimately undo_all.  But I consider that reasonable
 given the only reason this instance bled into RTL land was -O1 instead
 of -O2 compilation.

Although it's probably time to concede defeat in this particular case, I don't 
think we should use the NONJUMP_INSN_P big hammer because we want to eliminate 
branches if possible.  So I'd just add the minimal test to the two existing 
conditions so as to prevent the invalid RTL from being created, e.g.

   (!JUMP_P (i3) || SET_DEST (set[01]) == pc_rtx)

-- 
Eric Botcazou


Re: [RFA][rtl-optimization/52714] Do not allow combine to create invalid RTL

2014-02-14 Thread Jeff Law

On 02/11/14 02:06, Eric Botcazou wrote:

I pondered changing the condition for swapping the insn order, but it
didn't seem worth it.  I doubt we see many 3-2 combinations where I3 is
a JUMP_INSN, the result turns into two simple sets and the insn swapping
code you wrote decides it needs to swap the insns.


I didn't actually write it, just enhanced it recently, that's why I suggested
to do the same here.  It's one line of code and we have an example of valid
simplification at hand so I think we ought to do it.


It seems to me that as long as we're re-using the existing insns to
contain the simple sets that we have to ensure that they're INSNs, not
CALL_INSNs or JUMP_INSNs.


I disagree, nullifying JUMP_INSNs by changing them to (set (pc) (pc)) is a
standard method in the combiner.

So, the code has this structure

if (looks safe)
  emit in existing order
else if (reverse order looks safe)
  emit in reversed order
else
  undo_all


In this specific case, the existing order is never going to look safe 
because set1 uses (sp) as an input argument and use_crosses_set_p is 
very conservative when the value is the stack pointer on a PUSH_ROUNDING 
machine (such as the m68k)


So we could put the verification code that both I3 and I2 are INSNs in 
the else if (reverse order looks safe) clause.That would mean for 
this testcase, we ultimately undo_all.  But I consider that reasonable 
given the only reason this instance bled into RTL land was -O1 instead 
of -O2 compilation.


I already know that variant works as it's what I had before I started 
thinking about what happens if we have a CALL_INSN as I2.


Jeff



Re: [RFA][rtl-optimization/52714] Do not allow combine to create invalid RTL

2014-02-11 Thread Eric Botcazou
 I pondered changing the condition for swapping the insn order, but it
 didn't seem worth it.  I doubt we see many 3-2 combinations where I3 is
 a JUMP_INSN, the result turns into two simple sets and the insn swapping
 code you wrote decides it needs to swap the insns.

I didn't actually write it, just enhanced it recently, that's why I suggested 
to do the same here.  It's one line of code and we have an example of valid 
simplification at hand so I think we ought to do it.

 It seems to me that as long as we're re-using the existing insns to
 contain the simple sets that we have to ensure that they're INSNs, not
 CALL_INSNs or JUMP_INSNs.

I disagree, nullifying JUMP_INSNs by changing them to (set (pc) (pc)) is a 
standard method in the combiner.

-- 
Eric Botcazou


Re: [RFA][rtl-optimization/52714] Do not allow combine to create invalid RTL

2014-02-11 Thread Jeff Law

On 02/11/14 02:06, Eric Botcazou wrote:

I pondered changing the condition for swapping the insn order, but it
didn't seem worth it.  I doubt we see many 3-2 combinations where I3 is
a JUMP_INSN, the result turns into two simple sets and the insn swapping
code you wrote decides it needs to swap the insns.


I didn't actually write it, just enhanced it recently, that's why I suggested
to do the same here.  It's one line of code and we have an example of valid
simplification at hand so I think we ought to do it.


It seems to me that as long as we're re-using the existing insns to
contain the simple sets that we have to ensure that they're INSNs, not
CALL_INSNs or JUMP_INSNs.


I disagree, nullifying JUMP_INSNs by changing them to (set (pc) (pc)) is a
standard method in the combiner.
Right, but most of the time we're doing a 2-1 or 3-1 that turns the 
conditional jump into a nop -- at which point all we do is modify I3 
(which is the conditional jump) into a nop-jump.  That common case 
shouldn't be affected by my change.


It's only when we have a 3-2 or 4-2 combination where one of the 
earlier insns has a side effect we need to keep and we're able to 
simplify the conditional into a NOP that would be affected by my change. 
 And I suspect that is much more rare.


Regardless, I'll see what I can do.

THanks,
Jeff


Re: [RFA][rtl-optimization/52714] Do not allow combine to create invalid RTL

2014-02-09 Thread Jeff Law

On 02/08/14 03:44, Eric Botcazou wrote:

This is then combined into:

newpat = (parallel [
  (set (pc)
  (pc))
  (set (reg/v/f:SI 34 [ stack ])
  (reg/f:SI 15 %sp))
  ])

This isn't a recognized insn, and it needs to be split.  Since it's just
a parallel of independent sets, combine tries to split it into a pair of
assignments which look like:

(insn 16 14 17 2 (set (pc)
  (pc)) pr52714.c:10 2147483647 {NOOP_MOVE}
   (nil))
(jump_insn 17 16 18 2 (set (reg/v/f:SI 34 [ stack ])
  (reg/f:SI 15 %sp)) pr52714.c:10 38 {*movsi_m68k2}
   (int_list:REG_BR_PROB 1014 (nil))
   - 40)


Note how the second is a JUMP_INSN, but it doesn't modify the PC.  Opps.


ISTM the code in combine which tries to rip apart a PARALLEL like that
needs to ensure that I2 and I3 are both INSNs.


That seems to be an unnecessary pessimization given that the combination looks
perfectly valid if you swap the insns.  Can't we enhance the code just below
which chooses the order of the insns after splitting?
I pondered changing the condition for swapping the insn order, but it 
didn't seem worth it.  I doubt we see many 3-2 combinations where I3 is 
a JUMP_INSN, the result turns into two simple sets and the insn swapping 
code you wrote decides it needs to swap the insns.


I also pondered identifying the nop set within the parallel  taking it 
apart again.  Again, I rejected as I suspect it doesn't happen enough in 
practice to make it worth the effort.


I pondered adding  the guard for the newi2pat = set0; newpat = set1 case 
where we actually swap the insns.  But realized we could run into 
similar problems in the prior conditional if I2 was a CALL_INSN that we 
somehow simplified.


It seems to me that as long as we're re-using the existing insns to 
contain the simple sets that we have to ensure that they're INSNs, not 
CALL_INSNs or JUMP_INSNs.


I also pondered resetting the code on each insn via SET_CODE (whatever, 
INSN).  It's rather hacky, but the formats are close enough that it 
would work.  Then I realized that we can still undo_all later and didn't 
want to add compensation code to put the original code back.


Jeff


Re: [RFA][rtl-optimization/52714] Do not allow combine to create invalid RTL

2014-02-08 Thread Eric Botcazou
 This is then combined into:
 
 newpat = (parallel [
  (set (pc)
  (pc))
  (set (reg/v/f:SI 34 [ stack ])
  (reg/f:SI 15 %sp))
  ])
 
 This isn't a recognized insn, and it needs to be split.  Since it's just
 a parallel of independent sets, combine tries to split it into a pair of
 assignments which look like:
 
 (insn 16 14 17 2 (set (pc)
  (pc)) pr52714.c:10 2147483647 {NOOP_MOVE}
   (nil))
 (jump_insn 17 16 18 2 (set (reg/v/f:SI 34 [ stack ])
  (reg/f:SI 15 %sp)) pr52714.c:10 38 {*movsi_m68k2}
   (int_list:REG_BR_PROB 1014 (nil))
   - 40)
 
 
 Note how the second is a JUMP_INSN, but it doesn't modify the PC.  Opps.
 
 
 ISTM the code in combine which tries to rip apart a PARALLEL like that
 needs to ensure that I2 and I3 are both INSNs.

That seems to be an unnecessary pessimization given that the combination looks 
perfectly valid if you swap the insns.  Can't we enhance the code just below 
which chooses the order of the insns after splitting?

-- 
Eric Botcazou


[RFA][rtl-optimization/52714] Do not allow combine to create invalid RTL

2014-02-06 Thread Jeff Law


As mentioned in the PR, we call try_combine with:


i1 = (insn 14 13 16 2 (set (reg/v/f:SI 34 [ stack ])
(reg/f:SI 15 %sp)) pr52714.c:9 38 {*movsi_m68k2}
 (nil))
i2 = (insn 16 14 17 2 (set (cc0)
(compare (reg/v/f:SI 34 [ stack ])
(const_int 0 [0]))) pr52714.c:10 4 {*tstsi_internal_68020_cf}
 (nil))
i3 = (jump_insn 17 16 18 2 (set (pc)
(if_then_else (eq (cc0)
(const_int 0 [0]))
(label_ref 40)
(pc))) pr52714.c:10 390 {beq}
 (int_list:REG_BR_PROB 1014 (nil))
 - 40)

This is then combined into:

newpat = (parallel [
(set (pc)
(pc))
(set (reg/v/f:SI 34 [ stack ])
(reg/f:SI 15 %sp))
])

This isn't a recognized insn, and it needs to be split.  Since it's just 
a parallel of independent sets, combine tries to split it into a pair of 
assignments which look like:


(insn 16 14 17 2 (set (pc)
(pc)) pr52714.c:10 2147483647 {NOOP_MOVE}
 (nil))
(jump_insn 17 16 18 2 (set (reg/v/f:SI 34 [ stack ])
(reg/f:SI 15 %sp)) pr52714.c:10 38 {*movsi_m68k2}
 (int_list:REG_BR_PROB 1014 (nil))
 - 40)


Note how the second is a JUMP_INSN, but it doesn't modify the PC.  Opps.


ISTM the code in combine which tries to rip apart a PARALLEL like that 
needs to ensure that I2 and I3 are both INSNs.


The astute reader would notice that this wasn't optimized at the tree 
level.  That's an artifact of using -O1 instead of -O2.  At -O2 VRP 
would come in and zap that test well before it ever expanded into RTL.


Verified all 3 tests in the PR are fixed (two full, one reduced). 
Reduced test to be added to the test suite.


Bootstrap and regression test in progress on x86_64-unknown-linux-gnu. 
OK if that passes without incident?




diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index e489b62..7983139 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2014-02-06  Jeff Law  l...@redhat.com
+
+   PR rtl-optimization/52714
+   * combine.c (try_combine): When splitting an unrecognized PARALLEL
+   into two independent simple sets, make sure both I3 and I2 are
+   INSNs (as opposed to JUMP_INSNs or CALL_INSNs).
+
 2014-02-06  Kyrylo Tkachov  kyrylo.tkac...@arm.com
 
* config/arm/aarch-cost-tables.h (cortexa57_extra_costs): New table.
diff --git a/gcc/combine.c b/gcc/combine.c
index fd4294b..12db3a3 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -3692,7 +3692,11 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx i0, int 
*new_direct_jump_p,
! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 0)),
  XVECEXP (newpat, 0, 1))
! (contains_muldiv (SET_SRC (XVECEXP (newpat, 0, 0)))
- contains_muldiv (SET_SRC (XVECEXP (newpat, 0, 1)
+ contains_muldiv (SET_SRC (XVECEXP (newpat, 0, 1
+  /* Make sure both I2 and I3 are simple INSNs, otherwise we might
+ create a JUMP_INSN with a PATTERN that is a simple set.  */
+   GET_CODE (i2) == INSN
+   GET_CODE (i3) == INSN)
 {
   rtx set0 = XVECEXP (newpat, 0, 0);
   rtx set1 = XVECEXP (newpat, 0, 1);
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 2cac8d2..5d6e066 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2014-02-06  Jeff Law  l...@redhat.com
+
+   PR rtl-optimization/52714
+   * gcc.c-torture/compile/pr52714.c: New test.
+
 2014-02-06  Jakub Jelinek  ja...@redhat.com
 
PR target/59575
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr52714.c 
b/gcc/testsuite/gcc.c-torture/compile/pr52714.c
new file mode 100644
index 000..03d4990
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr52714.c
@@ -0,0 +1,25 @@
+
+int __re_compile_fastmap(unsigned char *p)
+{
+unsigned char **stack;
+unsigned size;
+unsigned avail;
+
+stack = __builtin_alloca(5 * sizeof(unsigned char*));
+if (stack == 0)
+   return -2;
+size = 5;
+avail = 0;
+
+for (;;) {
+   switch (*p++) {
+   case 0:
+   if (avail == size)
+   return -2;
+   stack[avail++] = p;
+   }
+}
+
+return 0;
+}
+