Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
On 11/18/2015 07:15 AM, Nick Clifton wrote: Hi Guys, I recently discovered a bug in the current Redundant Extension Elimination optimization. If the candidate extension instruction increases the number of hard registers used, the pass does not check to see if these extra registers are live between the definition and the extension. For example: (insn 44 (set (reg:QI r11) (mem:QI (reg:HI r20))) (insn 45 (set (reg:QI r10) (mem:QI (reg:HI r18))) [...] (insn 71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11))) [...] (insn 88 (set (reg:HI r10) (zero_extend:HI (reg:QI r10))) (This is on the RL78 target where HImode values occupy two hard registers and QImode values only one. The bug however is generic, not RL78 specific). The REE pass transforms this into: (insn 44 (set (reg:QI r11) (mem:QI (reg:HI r20))) (insn 45 (set (reg:HI r10) (zero_extend:HI (mem:QI (reg:HI r18 [...] (insn 71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11))) [...] (insn 88 deleted) Note how the new set at insn 45 clobbers the value loaded by insn 44 into r11. The patch below is my attempt to fix this. It is not correct however. I could not work out how to determine if a given hard register is live at any point between two insns. So instead I used the liveness information associated with the basic block containing the definition. If one of the extended registers is live or becomes live in this block, and this block is not the same block as the one containing the extension insn, then I stop the optimization. This works for the RL78 for all of the cases that I could find. So ... comments please ? Will gcc ever generate a single basic block that contains both the definition and the extension instructions, but also where the extra hard registers are used for another purpose as well ? Tested with no regressions (or fixes) on an x86-pc-linux-gnu target. Also tested with no regression and 7 fixes on an rl78-elf target. Cheers Nick gcc/ChangeLog 2015-11-18 Nick Clifton * ree.c (regs_live_between): New function. (add_removable_extension): If the extension uses more hard registers than the definition then check that the extra hard registers are not live between the definition and the extension. @@ -1080,6 +1123,30 @@ } } + /* Fourth, if the extended version occupies more registers than +the original version then make sure that these extra registers +are not live between the definition and the extension. */ + if (HARD_REGNO_NREGS (REGNO (dest), mode) + > HARD_REGNO_NREGS (REGNO (reg), GET_MODE (reg))) So this seems like a reasonable place to catch this. I certainly prefer to avoid work later by filtering earlier like you've done. The problem is with your implementation of regs_live_between. You're assuming we've got straight-line code between the original set and the extension. That's not guaranteed. Essentially we've got use-def chains via the DF framework. So, in theory, they can be in different blocks and arbitrary flow control between them. In fact, the extension could be before the original set in the insn chain (though obviously not for execution order). I think it's reasonable to just punt when we see that the source/dest register of the extension are the same and the number of hard regs is different. That's what I'm testing now. jeff
Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
> I would hazard a guess that the authors simply didn't consider the > multi-hard reg case. Essentially if the original set reached an > extension, then obviously the original set got there unharmed and the > extended destination should reach as well -- except that doesn't apply > to multi-word hard regs. This pass started as a Zero-Extension Elimination pass for x86-64 and was only considering implicit SI->DI extensions initially. The algorithm was tailored to this specific pattern and I agree that the pass should be reimplemented. -- Eric Botcazou
Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
On 11/20/2015 02:19 AM, Nick Clifton wrote: Hi Jeff, The code there would solve this problem, but the approach is is overly cautious, since it disables the optimization for all extensions that increase the number of hard registers used. Some of these will be viable candidates, provided that the extra hard registers are no used. (This is certainly true for the RL78, where the (patched) optimization does improve code, even though the widening does use extra registers). Nick -- can you pass along your testcode? Sure - this is for the RL78 toolchain. In theory the problem is generic, but I have not tested other toolchains. Compile the gcc.c-torture/execute/pr42833.c or gcc.c-torture/execute/strct-stdarg-1.c tests at -O1 or higher with -free also specified on the command line. Without -free these tests pass. With -free they fail. So this looks like a pretty fundamental failing in ree.c and AFAICT has been there since day 1. Thankfully, I don't think it's likely to trigger all that often. It's useful to ponder two cases. One where we're going to add a copy and one where we don't. I added the copy case during the 4.9 cycle as an extension of existing code. It only fires in easy to analyze circumstances -- when the original set and the extension are in the same basic block, but have different destination registers. In the copy case. The code checks that the destination of the *extension* is neither used nor set between the original set and the extension. So in the case of multi-word hard reg, we'll verify that the entire multi-word hard reg survives from the original insn to the extension. That avoids this problem in cases where a copy is needed. However, in the case where no copy is needed, no such checks are made. In fact the checks could be fairly painful as the original set and the extension can be in different blocks with arbitrary flow between them. I would hazard a guess that the authors simply didn't consider the multi-hard reg case. Essentially if the original set reached an extension, then obviously the original set got there unharmed and the extended destination should reach as well -- except that doesn't apply to multi-word hard regs. Given the expense in checking all the potential paths between the original set and the extension, I think just disallowing this case ought to be is the best way to go. Ideally we'll filter out that case before we get to combine_set_extension. But if we can't, it's easy enough to test there. Jeff
Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
On 11/20/2015 02:19 AM, Nick Clifton wrote: Hi Jeff, The code there would solve this problem, but the approach is is overly cautious, since it disables the optimization for all extensions that increase the number of hard registers used. Some of these will be viable candidates, provided that the extra hard registers are no used. (This is certainly true for the RL78, where the (patched) optimization does improve code, even though the widening does use extra registers). Nick -- can you pass along your testcode? Sure - this is for the RL78 toolchain. In theory the problem is generic, but I have not tested other toolchains. Right. It just really helps me to have something I can poke at -- it's just the type of learner I am. I tried to trip the test in that #if several times last year and never came up with anything. So naturally if we can trip it with the rl78 I want to dig into it. Compile the gcc.c-torture/execute/pr42833.c or gcc.c-torture/execute/strct-stdarg-1.c tests at -O1 or higher with -free also specified on the command line. Without -free these tests pass. With -free they fail. Perfect. Building rl78-elf cross bits as I type... jeff
Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
Hi Jeff, The code there would solve this problem, but the approach is is overly cautious, since it disables the optimization for all extensions that increase the number of hard registers used. Some of these will be viable candidates, provided that the extra hard registers are no used. (This is certainly true for the RL78, where the (patched) optimization does improve code, even though the widening does use extra registers). Nick -- can you pass along your testcode? Sure - this is for the RL78 toolchain. In theory the problem is generic, but I have not tested other toolchains. Compile the gcc.c-torture/execute/pr42833.c or gcc.c-torture/execute/strct-stdarg-1.c tests at -O1 or higher with -free also specified on the command line. Without -free these tests pass. With -free they fail. Cheers Nick
Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
On 11/18/2015 07:15 AM, Nick Clifton wrote: Hi Guys, I recently discovered a bug in the current Redundant Extension Elimination optimization. If the candidate extension instruction increases the number of hard registers used, the pass does not check to see if these extra registers are live between the definition and the extension. For example: (insn 44 (set (reg:QI r11) (mem:QI (reg:HI r20))) (insn 45 (set (reg:QI r10) (mem:QI (reg:HI r18))) [...] (insn 71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11))) [...] (insn 88 (set (reg:HI r10) (zero_extend:HI (reg:QI r10))) (This is on the RL78 target where HImode values occupy two hard registers and QImode values only one. The bug however is generic, not RL78 specific). Right. I can recall walking though multi-reg cases on my whiteboard. There was one particular case Jakub was concerned about in this space, but I couldn't contrive a testcase to trigger and I was getting to the point where I was struggling to be able to reason about the behaviour of the code. The REE pass transforms this into: (insn 44 (set (reg:QI r11) (mem:QI (reg:HI r20))) (insn 45 (set (reg:HI r10) (zero_extend:HI (mem:QI (reg:HI r18 [...] (insn 71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11))) [...] (insn 88 deleted) Note how the new set at insn 45 clobbers the value loaded by insn 44 into r11. Right. The patch below is my attempt to fix this. It is not correct however. I could not work out how to determine if a given hard register is live at any point between two insns. So instead I used the liveness information associated with the basic block containing the definition. If one of the extended registers is live or becomes live in this block, and this block is not the same block as the one containing the extension insn, then I stop the optimization. This works for the RL78 for all of the cases that I could find. The way this pass has dealt with this class of problems is the various routines in rtlanal.c reg_used_between_p, reg_set_between_p and the like. So ... comments please ? Will gcc ever generate a single basic block that contains both the definition and the extension instructions, but also where the extra hard registers are used for another purpose as well ? I can't see any reason why it would not. jeff
Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
On 11/19/2015 08:34 AM, Nick Clifton wrote: Hi Bernd, I had a look around. There's code testing HARD_REGNO_NREGS in ree.c:combine_set_extension. It's inside #if 0, and labelled "temporarily disabled". See if enabling that helps you? (Jeff, that #if 0 was added by you). I suspect that the code was disabled because it prevented too many useful optimization opportunities. The code there would solve this problem, but the approach is is overly cautious, since it disables the optimization for all extensions that increase the number of hard registers used. Some of these will be viable candidates, provided that the extra hard registers are no used. (This is certainly true for the RL78, where the (patched) optimization does improve code, even though the widening does use extra registers). Nick -- can you pass along your testcode? Jeff
Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
On 11/19/2015 10:18 AM, Bernd Schmidt wrote: On 11/19/2015 06:06 PM, Jeff Law wrote: Frankly, the overall structure of ree is a mess -- I've found it incredibly difficult to follow every time I've had to look at it. Yeah, no kidding. The check seems to be in the wrong place - it's done very late when we're replacing things, and IMO we should just restrict the candidates for optimization much earlier. And I didn't make it any cleaner when I hacked it up to handle one more case :(I'm hoping that a year away will allow me to look and think about the code again without getting a horrid headache. Also, I want to apply the following. Ok if testing succeeds? Bernd clean-ree1.diff commit ce68938b5150f5d41a54ed317ab97d98461be064 Author: Bernd Schmidt Date: Thu Nov 19 17:38:15 2015 +0100 Readability improvements in ree.c:combine_reaching_defs. * ree.c (combine_reaching_defs): Simplify code by caching expressions in variables. Yes. That's good with me. jeff
Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
On 11/19/2015 06:06 PM, Jeff Law wrote: Frankly, the overall structure of ree is a mess -- I've found it incredibly difficult to follow every time I've had to look at it. Yeah, no kidding. The check seems to be in the wrong place - it's done very late when we're replacing things, and IMO we should just restrict the candidates for optimization much earlier. Also, I want to apply the following. Ok if testing succeeds? Bernd commit ce68938b5150f5d41a54ed317ab97d98461be064 Author: Bernd Schmidt Date: Thu Nov 19 17:38:15 2015 +0100 Readability improvements in ree.c:combine_reaching_defs. * ree.c (combine_reaching_defs): Simplify code by caching expressions in variables. diff --git a/gcc/ree.c b/gcc/ree.c index b8436f2..4550cc3 100644 --- a/gcc/ree.c +++ b/gcc/ree.c @@ -731,6 +731,9 @@ get_extended_src_reg (rtx src) static bool combine_reaching_defs (ext_cand *cand, const_rtx set_pat, ext_state *state) { + rtx cand_pat = PATTERN (cand->insn); + rtx cand_dest = SET_DEST (cand_pat); + rtx cand_src = SET_SRC (cand_pat); rtx_insn *def_insn; bool merge_successful = true; int i; @@ -749,9 +752,8 @@ combine_reaching_defs (ext_cand *cand, const_rtx set_pat, ext_state *state) register than the source operand, then additional restrictions are needed. Note we have to handle cases where we have nested extensions in the source operand. */ - bool copy_needed -= (REGNO (SET_DEST (PATTERN (cand->insn))) - != REGNO (get_extended_src_reg (SET_SRC (PATTERN (cand->insn); + rtx src_reg = get_extended_src_reg (cand_src); + bool copy_needed = REGNO (cand_dest) != REGNO (src_reg); if (copy_needed) { /* Considering transformation of @@ -780,8 +782,7 @@ combine_reaching_defs (ext_cand *cand, const_rtx set_pat, ext_state *state) if (state->modified[INSN_UID (cand->insn)].kind != EXT_MODIFIED_NONE) return false; - machine_mode dst_mode = GET_MODE (SET_DEST (PATTERN (cand->insn))); - rtx src_reg = get_extended_src_reg (SET_SRC (PATTERN (cand->insn))); + machine_mode dst_mode = GET_MODE (cand_dest); /* Ensure the number of hard registers of the copy match. */ if (HARD_REGNO_NREGS (REGNO (src_reg), dst_mode) @@ -812,17 +813,15 @@ combine_reaching_defs (ext_cand *cand, const_rtx set_pat, ext_state *state) || !REG_P (SET_DEST (*dest_sub_rtx))) return false; - rtx tmp_reg = gen_rtx_REG (GET_MODE (SET_DEST (PATTERN (cand->insn))), + rtx tmp_reg = gen_rtx_REG (GET_MODE (cand_dest), REGNO (SET_DEST (*dest_sub_rtx))); - if (reg_overlap_mentioned_p (tmp_reg, SET_DEST (PATTERN (cand->insn + if (reg_overlap_mentioned_p (tmp_reg, cand_dest)) return false; /* The destination register of the extension insn must not be used or set between the def_insn and cand->insn exclusive. */ - if (reg_used_between_p (SET_DEST (PATTERN (cand->insn)), - def_insn, cand->insn) - || reg_set_between_p (SET_DEST (PATTERN (cand->insn)), -def_insn, cand->insn)) + if (reg_used_between_p (cand_dest, def_insn, cand->insn) + || reg_set_between_p (cand_dest, def_insn, cand->insn)) return false; /* We must be able to copy between the two registers. Generate, @@ -836,11 +835,8 @@ combine_reaching_defs (ext_cand *cand, const_rtx set_pat, ext_state *state) is different than in the code to emit the copy as we have not modified the defining insn yet. */ start_sequence (); - rtx pat = PATTERN (cand->insn); - rtx new_dst = gen_rtx_REG (GET_MODE (SET_DEST (pat)), - REGNO (get_extended_src_reg (SET_SRC (pat; - rtx new_src = gen_rtx_REG (GET_MODE (SET_DEST (pat)), - REGNO (SET_DEST (pat))); + rtx new_dst = gen_rtx_REG (GET_MODE (cand_dest), REGNO (src_reg)); + rtx new_src = gen_rtx_REG (GET_MODE (cand_dest), REGNO (cand_dest)); emit_move_insn (new_dst, new_src); rtx_insn *insn = get_insns(); @@ -854,7 +850,6 @@ combine_reaching_defs (ext_cand *cand, const_rtx set_pat, ext_state *state) return false; } - /* If cand->insn has been already modified, update cand->mode to a wider mode if possible, or punt. */ if (state->modified[INSN_UID (cand->insn)].kind != EXT_MODIFIED_NONE)
Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
On 11/19/2015 09:53 AM, Bernd Schmidt wrote: On 11/19/2015 04:34 PM, Nick Clifton wrote: Hi Bernd, I had a look around. There's code testing HARD_REGNO_NREGS in ree.c:combine_set_extension. It's inside #if 0, and labelled "temporarily disabled". See if enabling that helps you? (Jeff, that #if 0 was added by you). I suspect that the code was disabled because it prevented too many useful optimization opportunities. Well, correctness goes first. Since reenabling it fixes your problem, that change is approved (conditional on testing as usual obviously). I couldn't convince myself it was a correct check. Having Nick's testcase should help with the analysis. Frankly, the overall structure of ree is a mess -- I've found it incredibly difficult to follow every time I've had to look at it. Let me take a peek today and see if I can get familiar enough with this code again to help. Jeff
Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
On 11/19/2015 04:34 PM, Nick Clifton wrote: Hi Bernd, I had a look around. There's code testing HARD_REGNO_NREGS in ree.c:combine_set_extension. It's inside #if 0, and labelled "temporarily disabled". See if enabling that helps you? (Jeff, that #if 0 was added by you). I suspect that the code was disabled because it prevented too many useful optimization opportunities. Well, correctness goes first. Since reenabling it fixes your problem, that change is approved (conditional on testing as usual obviously). Bernd
Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
Hi Bernd, I had a look around. There's code testing HARD_REGNO_NREGS in ree.c:combine_set_extension. It's inside #if 0, and labelled "temporarily disabled". See if enabling that helps you? (Jeff, that #if 0 was added by you). I suspect that the code was disabled because it prevented too many useful optimization opportunities. The code there would solve this problem, but the approach is is overly cautious, since it disables the optimization for all extensions that increase the number of hard registers used. Some of these will be viable candidates, provided that the extra hard registers are no used. (This is certainly true for the RL78, where the (patched) optimization does improve code, even though the widening does use extra registers). Cheers Nick
Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
On 11/18/2015 05:52 PM, Bernd Schmidt wrote: (This is on the RL78 target where HImode values occupy two hard registers and QImode values only one. The bug however is generic, not RL78 specific). The REE pass transforms this into: (insn 44 (set (reg:QI r11) (mem:QI (reg:HI r20))) (insn 45 (set (reg:HI r10) (zero_extend:HI (mem:QI (reg:HI r18 [...] (insn 71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11))) [...] (insn 88 deleted) Note how the new set at insn 45 clobbers the value loaded by insn 44 into r11. I had a look around. There's code testing HARD_REGNO_NREGS in ree.c:combine_set_extension. It's inside #if 0, and labelled "temporarily disabled". See if enabling that helps you? (Jeff, that #if 0 was added by you). I was going to dig into this discussion tomorrow since I was the last one in that code. Just had to get that threading regression resolved first :-) jeff
Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
(This is on the RL78 target where HImode values occupy two hard registers and QImode values only one. The bug however is generic, not RL78 specific). The REE pass transforms this into: (insn 44 (set (reg:QI r11) (mem:QI (reg:HI r20))) (insn 45 (set (reg:HI r10) (zero_extend:HI (mem:QI (reg:HI r18 [...] (insn 71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11))) [...] (insn 88 deleted) Note how the new set at insn 45 clobbers the value loaded by insn 44 into r11. I had a look around. There's code testing HARD_REGNO_NREGS in ree.c:combine_set_extension. It's inside #if 0, and labelled "temporarily disabled". See if enabling that helps you? (Jeff, that #if 0 was added by you). Bernd