Re: [RFA][rtl-optimization/52714] Do not allow combine to create invalid RTL
On 02/17/14 02:28, Eric Botcazou wrote: Although it's probably time to concede defeat in this particular case, I don't think we should use the NONJUMP_INSN_P big hammer because we want to eliminate branches if possible. So I'd just add the minimal test to the two existing conditions so as to prevent the invalid RTL from being created, e.g. (!JUMP_P (i3) || SET_DEST (set[01]) == pc_rtx) Done. Attached is the final iteration that got installed. Verified with a cross compiler. Bootstrap on m68k will start at some point and finish in the next week or so. jeff diff --git a/gcc/ChangeLog b/gcc/ChangeLog index a20cee3..16c499b 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,10 @@ +2014-02-27 Jeff Law l...@redhat.com + + PR rtl-optimization/52714 + * combine.c (try_combine): When splitting an unrecognized PARALLEL + into two independent simple sets, if I3 is a jump, ensure the + pattern we place into I3 is a (set (pc) ...) + 2014-02-27 Mikael Pettersson mi...@it.uu.se Jeff Law l...@redhat.com diff --git a/gcc/combine.c b/gcc/combine.c index 1b598b4..fc473b6 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -3712,6 +3712,9 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx i0, int *new_direct_jump_p, #ifdef HAVE_cc0 !reg_referenced_p (cc0_rtx, set0) #endif + /* If I3 is a jump, ensure that set0 is a jump so that +we do not create invalid RTL. */ + (!JUMP_P (i3) || SET_DEST (set0) == pc_rtx) ) { newi2pat = set1; @@ -3726,6 +3729,9 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx i0, int *new_direct_jump_p, #ifdef HAVE_cc0 !reg_referenced_p (cc0_rtx, set1) #endif + /* If I3 is a jump, ensure that set1 is a jump so that + we do not create invalid RTL. */ + (!JUMP_P (i3) || SET_DEST (set1) == pc_rtx) ) { newi2pat = set0; diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index be4cb12..b95bcfd 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2014-02-27 Jeff Law l...@redhat.com + + PR rtl-optimization/52714 + * gcc.c-torture/compile/pr52714.c: New test. + 2014-02-27 Mikael Pettersson mi...@it.uu.se Jeff Law l...@redhat.com diff --git a/gcc/testsuite/gcc.c-torture/compile/pr52714.c b/gcc/testsuite/gcc.c-torture/compile/pr52714.c new file mode 100644 index 000..03d4990 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/compile/pr52714.c @@ -0,0 +1,25 @@ + +int __re_compile_fastmap(unsigned char *p) +{ +unsigned char **stack; +unsigned size; +unsigned avail; + +stack = __builtin_alloca(5 * sizeof(unsigned char*)); +if (stack == 0) + return -2; +size = 5; +avail = 0; + +for (;;) { + switch (*p++) { + case 0: + if (avail == size) + return -2; + stack[avail++] = p; + } +} + +return 0; +} +
Re: [RFA][rtl-optimization/52714] Do not allow combine to create invalid RTL
On 02/17/14 02:28, Eric Botcazou wrote: So, the code has this structure if (looks safe) emit in existing order else if (reverse order looks safe) emit in reversed order else undo_all In this specific case, the existing order is never going to look safe because set1 uses (sp) as an input argument and use_crosses_set_p is very conservative when the value is the stack pointer on a PUSH_ROUNDING machine (such as the m68k) What a bummer, it seems all things are aligned so as to block the optimization here... It is a bit of a shame, as this is a powerful combination that would eliminate a branch if it succeeded. The combination of the 3 insns yields the (set (pc) (pc)) pattern and it's only because added_sets_1 is true that the annoying PARALLEL is created. Yea. As I mentioned before, I actually considered peeking inside the vector for a (set (pc) (pc)) and simplifying vector (possibly back down to a single set). I really like eliminating branches ;-0 But I couldn't convince myself it'd be worth the effort. So we could put the verification code that both I3 and I2 are INSNs in the else if (reverse order looks safe) clause.That would mean for this testcase, we ultimately undo_all. But I consider that reasonable given the only reason this instance bled into RTL land was -O1 instead of -O2 compilation. Although it's probably time to concede defeat in this particular case, I don't think we should use the NONJUMP_INSN_P big hammer because we want to eliminate branches if possible. So I'd just add the minimal test to the two existing conditions so as to prevent the invalid RTL from being created, e.g. (!JUMP_P (i3) || SET_DEST (set[01]) == pc_rtx) OK. I'll give that a spin. jeff
Re: [RFA][rtl-optimization/52714] Do not allow combine to create invalid RTL
So, the code has this structure if (looks safe) emit in existing order else if (reverse order looks safe) emit in reversed order else undo_all In this specific case, the existing order is never going to look safe because set1 uses (sp) as an input argument and use_crosses_set_p is very conservative when the value is the stack pointer on a PUSH_ROUNDING machine (such as the m68k) What a bummer, it seems all things are aligned so as to block the optimization here... It is a bit of a shame, as this is a powerful combination that would eliminate a branch if it succeeded. The combination of the 3 insns yields the (set (pc) (pc)) pattern and it's only because added_sets_1 is true that the annoying PARALLEL is created. So we could put the verification code that both I3 and I2 are INSNs in the else if (reverse order looks safe) clause.That would mean for this testcase, we ultimately undo_all. But I consider that reasonable given the only reason this instance bled into RTL land was -O1 instead of -O2 compilation. Although it's probably time to concede defeat in this particular case, I don't think we should use the NONJUMP_INSN_P big hammer because we want to eliminate branches if possible. So I'd just add the minimal test to the two existing conditions so as to prevent the invalid RTL from being created, e.g. (!JUMP_P (i3) || SET_DEST (set[01]) == pc_rtx) -- Eric Botcazou
Re: [RFA][rtl-optimization/52714] Do not allow combine to create invalid RTL
On 02/11/14 02:06, Eric Botcazou wrote: I pondered changing the condition for swapping the insn order, but it didn't seem worth it. I doubt we see many 3-2 combinations where I3 is a JUMP_INSN, the result turns into two simple sets and the insn swapping code you wrote decides it needs to swap the insns. I didn't actually write it, just enhanced it recently, that's why I suggested to do the same here. It's one line of code and we have an example of valid simplification at hand so I think we ought to do it. It seems to me that as long as we're re-using the existing insns to contain the simple sets that we have to ensure that they're INSNs, not CALL_INSNs or JUMP_INSNs. I disagree, nullifying JUMP_INSNs by changing them to (set (pc) (pc)) is a standard method in the combiner. So, the code has this structure if (looks safe) emit in existing order else if (reverse order looks safe) emit in reversed order else undo_all In this specific case, the existing order is never going to look safe because set1 uses (sp) as an input argument and use_crosses_set_p is very conservative when the value is the stack pointer on a PUSH_ROUNDING machine (such as the m68k) So we could put the verification code that both I3 and I2 are INSNs in the else if (reverse order looks safe) clause.That would mean for this testcase, we ultimately undo_all. But I consider that reasonable given the only reason this instance bled into RTL land was -O1 instead of -O2 compilation. I already know that variant works as it's what I had before I started thinking about what happens if we have a CALL_INSN as I2. Jeff
Re: [RFA][rtl-optimization/52714] Do not allow combine to create invalid RTL
I pondered changing the condition for swapping the insn order, but it didn't seem worth it. I doubt we see many 3-2 combinations where I3 is a JUMP_INSN, the result turns into two simple sets and the insn swapping code you wrote decides it needs to swap the insns. I didn't actually write it, just enhanced it recently, that's why I suggested to do the same here. It's one line of code and we have an example of valid simplification at hand so I think we ought to do it. It seems to me that as long as we're re-using the existing insns to contain the simple sets that we have to ensure that they're INSNs, not CALL_INSNs or JUMP_INSNs. I disagree, nullifying JUMP_INSNs by changing them to (set (pc) (pc)) is a standard method in the combiner. -- Eric Botcazou
Re: [RFA][rtl-optimization/52714] Do not allow combine to create invalid RTL
On 02/11/14 02:06, Eric Botcazou wrote: I pondered changing the condition for swapping the insn order, but it didn't seem worth it. I doubt we see many 3-2 combinations where I3 is a JUMP_INSN, the result turns into two simple sets and the insn swapping code you wrote decides it needs to swap the insns. I didn't actually write it, just enhanced it recently, that's why I suggested to do the same here. It's one line of code and we have an example of valid simplification at hand so I think we ought to do it. It seems to me that as long as we're re-using the existing insns to contain the simple sets that we have to ensure that they're INSNs, not CALL_INSNs or JUMP_INSNs. I disagree, nullifying JUMP_INSNs by changing them to (set (pc) (pc)) is a standard method in the combiner. Right, but most of the time we're doing a 2-1 or 3-1 that turns the conditional jump into a nop -- at which point all we do is modify I3 (which is the conditional jump) into a nop-jump. That common case shouldn't be affected by my change. It's only when we have a 3-2 or 4-2 combination where one of the earlier insns has a side effect we need to keep and we're able to simplify the conditional into a NOP that would be affected by my change. And I suspect that is much more rare. Regardless, I'll see what I can do. THanks, Jeff
Re: [RFA][rtl-optimization/52714] Do not allow combine to create invalid RTL
On 02/08/14 03:44, Eric Botcazou wrote: This is then combined into: newpat = (parallel [ (set (pc) (pc)) (set (reg/v/f:SI 34 [ stack ]) (reg/f:SI 15 %sp)) ]) This isn't a recognized insn, and it needs to be split. Since it's just a parallel of independent sets, combine tries to split it into a pair of assignments which look like: (insn 16 14 17 2 (set (pc) (pc)) pr52714.c:10 2147483647 {NOOP_MOVE} (nil)) (jump_insn 17 16 18 2 (set (reg/v/f:SI 34 [ stack ]) (reg/f:SI 15 %sp)) pr52714.c:10 38 {*movsi_m68k2} (int_list:REG_BR_PROB 1014 (nil)) - 40) Note how the second is a JUMP_INSN, but it doesn't modify the PC. Opps. ISTM the code in combine which tries to rip apart a PARALLEL like that needs to ensure that I2 and I3 are both INSNs. That seems to be an unnecessary pessimization given that the combination looks perfectly valid if you swap the insns. Can't we enhance the code just below which chooses the order of the insns after splitting? I pondered changing the condition for swapping the insn order, but it didn't seem worth it. I doubt we see many 3-2 combinations where I3 is a JUMP_INSN, the result turns into two simple sets and the insn swapping code you wrote decides it needs to swap the insns. I also pondered identifying the nop set within the parallel taking it apart again. Again, I rejected as I suspect it doesn't happen enough in practice to make it worth the effort. I pondered adding the guard for the newi2pat = set0; newpat = set1 case where we actually swap the insns. But realized we could run into similar problems in the prior conditional if I2 was a CALL_INSN that we somehow simplified. It seems to me that as long as we're re-using the existing insns to contain the simple sets that we have to ensure that they're INSNs, not CALL_INSNs or JUMP_INSNs. I also pondered resetting the code on each insn via SET_CODE (whatever, INSN). It's rather hacky, but the formats are close enough that it would work. Then I realized that we can still undo_all later and didn't want to add compensation code to put the original code back. Jeff
Re: [RFA][rtl-optimization/52714] Do not allow combine to create invalid RTL
This is then combined into: newpat = (parallel [ (set (pc) (pc)) (set (reg/v/f:SI 34 [ stack ]) (reg/f:SI 15 %sp)) ]) This isn't a recognized insn, and it needs to be split. Since it's just a parallel of independent sets, combine tries to split it into a pair of assignments which look like: (insn 16 14 17 2 (set (pc) (pc)) pr52714.c:10 2147483647 {NOOP_MOVE} (nil)) (jump_insn 17 16 18 2 (set (reg/v/f:SI 34 [ stack ]) (reg/f:SI 15 %sp)) pr52714.c:10 38 {*movsi_m68k2} (int_list:REG_BR_PROB 1014 (nil)) - 40) Note how the second is a JUMP_INSN, but it doesn't modify the PC. Opps. ISTM the code in combine which tries to rip apart a PARALLEL like that needs to ensure that I2 and I3 are both INSNs. That seems to be an unnecessary pessimization given that the combination looks perfectly valid if you swap the insns. Can't we enhance the code just below which chooses the order of the insns after splitting? -- Eric Botcazou
[RFA][rtl-optimization/52714] Do not allow combine to create invalid RTL
As mentioned in the PR, we call try_combine with: i1 = (insn 14 13 16 2 (set (reg/v/f:SI 34 [ stack ]) (reg/f:SI 15 %sp)) pr52714.c:9 38 {*movsi_m68k2} (nil)) i2 = (insn 16 14 17 2 (set (cc0) (compare (reg/v/f:SI 34 [ stack ]) (const_int 0 [0]))) pr52714.c:10 4 {*tstsi_internal_68020_cf} (nil)) i3 = (jump_insn 17 16 18 2 (set (pc) (if_then_else (eq (cc0) (const_int 0 [0])) (label_ref 40) (pc))) pr52714.c:10 390 {beq} (int_list:REG_BR_PROB 1014 (nil)) - 40) This is then combined into: newpat = (parallel [ (set (pc) (pc)) (set (reg/v/f:SI 34 [ stack ]) (reg/f:SI 15 %sp)) ]) This isn't a recognized insn, and it needs to be split. Since it's just a parallel of independent sets, combine tries to split it into a pair of assignments which look like: (insn 16 14 17 2 (set (pc) (pc)) pr52714.c:10 2147483647 {NOOP_MOVE} (nil)) (jump_insn 17 16 18 2 (set (reg/v/f:SI 34 [ stack ]) (reg/f:SI 15 %sp)) pr52714.c:10 38 {*movsi_m68k2} (int_list:REG_BR_PROB 1014 (nil)) - 40) Note how the second is a JUMP_INSN, but it doesn't modify the PC. Opps. ISTM the code in combine which tries to rip apart a PARALLEL like that needs to ensure that I2 and I3 are both INSNs. The astute reader would notice that this wasn't optimized at the tree level. That's an artifact of using -O1 instead of -O2. At -O2 VRP would come in and zap that test well before it ever expanded into RTL. Verified all 3 tests in the PR are fixed (two full, one reduced). Reduced test to be added to the test suite. Bootstrap and regression test in progress on x86_64-unknown-linux-gnu. OK if that passes without incident? diff --git a/gcc/ChangeLog b/gcc/ChangeLog index e489b62..7983139 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,10 @@ +2014-02-06 Jeff Law l...@redhat.com + + PR rtl-optimization/52714 + * combine.c (try_combine): When splitting an unrecognized PARALLEL + into two independent simple sets, make sure both I3 and I2 are + INSNs (as opposed to JUMP_INSNs or CALL_INSNs). + 2014-02-06 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/aarch-cost-tables.h (cortexa57_extra_costs): New table. diff --git a/gcc/combine.c b/gcc/combine.c index fd4294b..12db3a3 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -3692,7 +3692,11 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx i0, int *new_direct_jump_p, ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 0)), XVECEXP (newpat, 0, 1)) ! (contains_muldiv (SET_SRC (XVECEXP (newpat, 0, 0))) - contains_muldiv (SET_SRC (XVECEXP (newpat, 0, 1) + contains_muldiv (SET_SRC (XVECEXP (newpat, 0, 1 + /* Make sure both I2 and I3 are simple INSNs, otherwise we might + create a JUMP_INSN with a PATTERN that is a simple set. */ + GET_CODE (i2) == INSN + GET_CODE (i3) == INSN) { rtx set0 = XVECEXP (newpat, 0, 0); rtx set1 = XVECEXP (newpat, 0, 1); diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 2cac8d2..5d6e066 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2014-02-06 Jeff Law l...@redhat.com + + PR rtl-optimization/52714 + * gcc.c-torture/compile/pr52714.c: New test. + 2014-02-06 Jakub Jelinek ja...@redhat.com PR target/59575 diff --git a/gcc/testsuite/gcc.c-torture/compile/pr52714.c b/gcc/testsuite/gcc.c-torture/compile/pr52714.c new file mode 100644 index 000..03d4990 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/compile/pr52714.c @@ -0,0 +1,25 @@ + +int __re_compile_fastmap(unsigned char *p) +{ +unsigned char **stack; +unsigned size; +unsigned avail; + +stack = __builtin_alloca(5 * sizeof(unsigned char*)); +if (stack == 0) + return -2; +size = 5; +avail = 0; + +for (;;) { + switch (*p++) { + case 0: + if (avail == size) + return -2; + stack[avail++] = p; + } +} + +return 0; +} +