Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers

2015-11-20 Thread Jeff Law

On 11/18/2015 07:15 AM, Nick Clifton wrote:

Hi Guys,

   I recently discovered a bug in the current Redundant Extension
   Elimination optimization.  If the candidate extension instruction
   increases the number of hard registers used, the pass does not check
   to see if these extra registers are live between the definition and
   the extension.

   For example:

 (insn  44 (set (reg:QI r11) (mem:QI (reg:HI r20)))
 (insn  45 (set (reg:QI r10) (mem:QI (reg:HI r18)))
 [...]
 (insn  71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11)))
 [...]
 (insn  88 (set (reg:HI r10) (zero_extend:HI (reg:QI r10)))

   (This is on the RL78 target where HImode values occupy two hard
   registers and QImode values only one.  The bug however is generic, not
   RL78 specific).

   The REE pass transforms this into:

 (insn  44 (set (reg:QI r11) (mem:QI (reg:HI r20)))
 (insn  45 (set (reg:HI r10) (zero_extend:HI (mem:QI (reg:HI r18
 [...]
 (insn  71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11)))
 [...]
 (insn  88 deleted)

   Note how the new set at insn 45 clobbers the value loaded by insn 44
   into r11.

   The patch below is my attempt to fix this.  It is not correct
   however.  I could not work out how to determine if a given hard
   register is live at any point between two insns.  So instead I used
   the liveness information associated with the basic block containing
   the definition.  If one of the extended registers is live or becomes
   live in this block, and this block is not the same block as the one
   containing the extension insn, then I stop the optimization.  This
   works for the RL78 for all of the cases that I could find.

   So ... comments please ?

   Will gcc ever generate a single basic block that contains both the
   definition and the extension instructions, but also where the extra
   hard registers are used for another purpose as well ?

   Tested with no regressions (or fixes) on an x86-pc-linux-gnu target.
   Also tested with no regression and 7 fixes on an rl78-elf target.

Cheers
   Nick

gcc/ChangeLog
2015-11-18  Nick Clifton  

* ree.c (regs_live_between): New function.
 (add_removable_extension): If the extension uses more hard
 registers than the definition then check that the extra hard
 registers are not live between the definition and the
 extension.

@@ -1080,6 +1123,30 @@
  }
  }

+  /* Fourth, if the extended version occupies more registers than
+the original version then make sure that these extra registers
+are not live between the definition and the extension.  */
+  if (HARD_REGNO_NREGS (REGNO (dest), mode)
+ > HARD_REGNO_NREGS (REGNO (reg), GET_MODE (reg)))
So this seems like a reasonable place to catch this.  I certainly prefer 
to avoid work later by filtering earlier like you've done.


The problem is with your implementation of regs_live_between.  You're 
assuming we've got straight-line code between the original set and the 
extension.  That's not guaranteed.  Essentially we've got use-def chains 
via the DF framework.  So, in theory, they can be in different blocks 
and arbitrary flow control between them.  In fact, the extension could 
be before the original set in the insn chain (though obviously not for 
execution order).


I think it's reasonable to just punt when we see that the source/dest 
register of the extension are the same and the number of hard regs is 
different.  That's what I'm testing now.


jeff




Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers

2015-11-20 Thread Eric Botcazou
> I would hazard a guess that the authors simply didn't consider the
> multi-hard reg case.  Essentially if the original set reached an
> extension, then obviously the original set got there unharmed and the
> extended destination should reach as well -- except that doesn't apply
> to multi-word hard regs.

This pass started as a Zero-Extension Elimination pass for x86-64 and was only 
considering implicit SI->DI extensions initially.  The algorithm was tailored 
to this specific pattern and I agree that the pass should be reimplemented.

-- 
Eric Botcazou


Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers

2015-11-20 Thread Jeff Law

On 11/20/2015 02:19 AM, Nick Clifton wrote:

Hi Jeff,


The code there would solve this problem, but the approach is is overly
cautious, since it disables the optimization for all extensions that
increase the number of hard registers used.  Some of these will be
viable candidates, provided that the extra hard registers are no used.
(This is certainly true for the RL78, where the (patched) optimization
does improve code, even though the widening does use extra registers).

Nick -- can you pass along your testcode?


Sure - this is for the RL78 toolchain.  In theory the problem is
generic, but I have not tested other toolchains.

Compile the gcc.c-torture/execute/pr42833.c or
gcc.c-torture/execute/strct-stdarg-1.c tests at -O1 or higher with -free
also specified on the command line.  Without -free these tests pass.
With -free they fail.
So this looks like a pretty fundamental failing in ree.c and AFAICT has 
been there since day 1.  Thankfully, I don't think it's likely to 
trigger all that often.


It's useful to ponder two cases.  One where we're going to add a copy 
and one where we don't.  I added the copy case during the 4.9 cycle as 
an extension of existing code.  It only fires in easy to analyze 
circumstances -- when the original set and the extension are in the same 
basic block, but have different destination registers.


In the copy case.  The code checks that the destination of the 
*extension* is neither used nor set between the original set and the 
extension.  So in the case of multi-word hard reg, we'll verify that the 
entire multi-word hard reg survives from the original insn to the 
extension.  That avoids this problem in cases where a copy is needed.


However, in the case where no copy is needed, no such checks are made. 
In fact the checks could be fairly painful as the original set and the 
extension can be in different blocks with arbitrary flow between them.


I would hazard a guess that the authors simply didn't consider the 
multi-hard reg case.  Essentially if the original set reached an 
extension, then obviously the original set got there unharmed and the 
extended destination should reach as well -- except that doesn't apply 
to multi-word hard regs.


Given the expense in checking all the potential paths between the 
original set and the extension, I think just disallowing this case ought 
to be is the best way to go.  Ideally we'll filter out that case before 
we get to combine_set_extension.  But if we can't, it's easy enough to 
test there.


Jeff


Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers

2015-11-20 Thread Jeff Law

On 11/20/2015 02:19 AM, Nick Clifton wrote:

Hi Jeff,


The code there would solve this problem, but the approach is is overly
cautious, since it disables the optimization for all extensions that
increase the number of hard registers used.  Some of these will be
viable candidates, provided that the extra hard registers are no used.
(This is certainly true for the RL78, where the (patched) optimization
does improve code, even though the widening does use extra registers).

Nick -- can you pass along your testcode?


Sure - this is for the RL78 toolchain.  In theory the problem is
generic, but I have not tested other toolchains.
Right.  It just really helps me to have something I can poke at -- it's 
just the type of learner I am.  I tried to trip the test in that #if 
several times last year and never came up with anything.  So naturally 
if we can trip it with the rl78 I want to dig into it.





Compile the gcc.c-torture/execute/pr42833.c or
gcc.c-torture/execute/strct-stdarg-1.c tests at -O1 or higher with -free
also specified on the command line.  Without -free these tests pass.
With -free they fail.

Perfect.  Building rl78-elf cross bits as I type...

jeff


Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers

2015-11-20 Thread Nick Clifton

Hi Jeff,


The code there would solve this problem, but the approach is is overly
cautious, since it disables the optimization for all extensions that
increase the number of hard registers used.  Some of these will be
viable candidates, provided that the extra hard registers are no used.
(This is certainly true for the RL78, where the (patched) optimization
does improve code, even though the widening does use extra registers).

Nick -- can you pass along your testcode?


Sure - this is for the RL78 toolchain.  In theory the problem is 
generic, but I have not tested other toolchains.


Compile the gcc.c-torture/execute/pr42833.c or 
gcc.c-torture/execute/strct-stdarg-1.c tests at -O1 or higher with -free 
also specified on the command line.  Without -free these tests pass. 
With -free they fail.


Cheers
  Nick



Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers

2015-11-19 Thread Jeff Law

On 11/18/2015 07:15 AM, Nick Clifton wrote:

Hi Guys,

   I recently discovered a bug in the current Redundant Extension
   Elimination optimization.  If the candidate extension instruction
   increases the number of hard registers used, the pass does not check
   to see if these extra registers are live between the definition and
   the extension.

   For example:

 (insn  44 (set (reg:QI r11) (mem:QI (reg:HI r20)))
 (insn  45 (set (reg:QI r10) (mem:QI (reg:HI r18)))
 [...]
 (insn  71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11)))
 [...]
 (insn  88 (set (reg:HI r10) (zero_extend:HI (reg:QI r10)))

   (This is on the RL78 target where HImode values occupy two hard
   registers and QImode values only one.  The bug however is generic, not
   RL78 specific).
Right.  I can recall walking though multi-reg cases on my whiteboard. 
There was one particular case Jakub was concerned about in this space, 
but I couldn't contrive a testcase to trigger and I was getting to the 
point where I was struggling to be able to reason about the behaviour of 
the code.







   The REE pass transforms this into:

 (insn  44 (set (reg:QI r11) (mem:QI (reg:HI r20)))
 (insn  45 (set (reg:HI r10) (zero_extend:HI (mem:QI (reg:HI r18
 [...]
 (insn  71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11)))
 [...]
 (insn  88 deleted)

   Note how the new set at insn 45 clobbers the value loaded by insn 44
   into r11.

Right.



   The patch below is my attempt to fix this.  It is not correct
   however.  I could not work out how to determine if a given hard
   register is live at any point between two insns.  So instead I used
   the liveness information associated with the basic block containing
   the definition.  If one of the extended registers is live or becomes
   live in this block, and this block is not the same block as the one
   containing the extension insn, then I stop the optimization.  This
   works for the RL78 for all of the cases that I could find.
The way this pass has dealt with this class of problems is the various 
routines in rtlanal.c  reg_used_between_p, reg_set_between_p and the like.





   So ... comments please ?

   Will gcc ever generate a single basic block that contains both the
   definition and the extension instructions, but also where the extra
   hard registers are used for another purpose as well ?

I can't see any reason why it would not.


jeff


Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers

2015-11-19 Thread Jeff Law

On 11/19/2015 08:34 AM, Nick Clifton wrote:

Hi Bernd,


I had a look around. There's code testing HARD_REGNO_NREGS in
ree.c:combine_set_extension. It's inside #if 0, and labelled
"temporarily disabled". See if enabling that helps you? (Jeff, that #if
0 was added by you).


I suspect that the code was disabled because it prevented too many
useful optimization opportunities.

The code there would solve this problem, but the approach is is overly
cautious, since it disables the optimization for all extensions that
increase the number of hard registers used.  Some of these will be
viable candidates, provided that the extra hard registers are no used.
(This is certainly true for the RL78, where the (patched) optimization
does improve code, even though the widening does use extra registers).

Nick -- can you pass along your testcode?

Jeff



Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers

2015-11-19 Thread Jeff Law

On 11/19/2015 10:18 AM, Bernd Schmidt wrote:

On 11/19/2015 06:06 PM, Jeff Law wrote:


Frankly, the overall structure of ree is a mess -- I've found it
incredibly difficult to follow every time I've had to look at it.


Yeah, no kidding. The check seems to be in the wrong place - it's done
very late when we're replacing things, and IMO we should just restrict
the candidates for optimization much earlier.
And I didn't make it any cleaner when I hacked it up to handle one more 
case :(I'm hoping that a year away will allow me to look and think 
about the code again without getting a horrid headache.





Also, I want to apply the following. Ok if testing succeeds?


Bernd

clean-ree1.diff


commit ce68938b5150f5d41a54ed317ab97d98461be064
Author: Bernd Schmidt
Date:   Thu Nov 19 17:38:15 2015 +0100

 Readability improvements in ree.c:combine_reaching_defs.

* ree.c (combine_reaching_defs): Simplify code by caching expressions
in variables.

Yes. That's good with me.

jeff



Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers

2015-11-19 Thread Bernd Schmidt

On 11/19/2015 06:06 PM, Jeff Law wrote:


Frankly, the overall structure of ree is a mess -- I've found it
incredibly difficult to follow every time I've had to look at it.


Yeah, no kidding. The check seems to be in the wrong place - it's done 
very late when we're replacing things, and IMO we should just restrict 
the candidates for optimization much earlier.


Also, I want to apply the following. Ok if testing succeeds?


Bernd
commit ce68938b5150f5d41a54ed317ab97d98461be064
Author: Bernd Schmidt 
Date:   Thu Nov 19 17:38:15 2015 +0100

Readability improvements in ree.c:combine_reaching_defs.

	* ree.c (combine_reaching_defs): Simplify code by caching expressions
	in variables.

diff --git a/gcc/ree.c b/gcc/ree.c
index b8436f2..4550cc3 100644
--- a/gcc/ree.c
+++ b/gcc/ree.c
@@ -731,6 +731,9 @@ get_extended_src_reg (rtx src)
 static bool
 combine_reaching_defs (ext_cand *cand, const_rtx set_pat, ext_state *state)
 {
+  rtx cand_pat = PATTERN (cand->insn);
+  rtx cand_dest = SET_DEST (cand_pat);
+  rtx cand_src = SET_SRC (cand_pat);
   rtx_insn *def_insn;
   bool merge_successful = true;
   int i;
@@ -749,9 +752,8 @@ combine_reaching_defs (ext_cand *cand, const_rtx set_pat, ext_state *state)
  register than the source operand, then additional restrictions
  are needed.  Note we have to handle cases where we have nested
  extensions in the source operand.  */
-  bool copy_needed
-= (REGNO (SET_DEST (PATTERN (cand->insn)))
-   != REGNO (get_extended_src_reg (SET_SRC (PATTERN (cand->insn);
+  rtx src_reg = get_extended_src_reg (cand_src);
+  bool copy_needed = REGNO (cand_dest) != REGNO (src_reg);
   if (copy_needed)
 {
   /* Considering transformation of
@@ -780,8 +782,7 @@ combine_reaching_defs (ext_cand *cand, const_rtx set_pat, ext_state *state)
   if (state->modified[INSN_UID (cand->insn)].kind != EXT_MODIFIED_NONE)
 	return false;
 
-  machine_mode dst_mode = GET_MODE (SET_DEST (PATTERN (cand->insn)));
-  rtx src_reg = get_extended_src_reg (SET_SRC (PATTERN (cand->insn)));
+  machine_mode dst_mode = GET_MODE (cand_dest);
 
   /* Ensure the number of hard registers of the copy match.  */
   if (HARD_REGNO_NREGS (REGNO (src_reg), dst_mode)
@@ -812,17 +813,15 @@ combine_reaching_defs (ext_cand *cand, const_rtx set_pat, ext_state *state)
 	  || !REG_P (SET_DEST (*dest_sub_rtx)))
 	return false;
 
-  rtx tmp_reg = gen_rtx_REG (GET_MODE (SET_DEST (PATTERN (cand->insn))),
+  rtx tmp_reg = gen_rtx_REG (GET_MODE (cand_dest),
  REGNO (SET_DEST (*dest_sub_rtx)));
-  if (reg_overlap_mentioned_p (tmp_reg, SET_DEST (PATTERN (cand->insn
+  if (reg_overlap_mentioned_p (tmp_reg, cand_dest))
 	return false;
 
   /* The destination register of the extension insn must not be
 	 used or set between the def_insn and cand->insn exclusive.  */
-  if (reg_used_between_p (SET_DEST (PATTERN (cand->insn)),
-			  def_insn, cand->insn)
-	  || reg_set_between_p (SET_DEST (PATTERN (cand->insn)),
-def_insn, cand->insn))
+  if (reg_used_between_p (cand_dest, def_insn, cand->insn)
+	  || reg_set_between_p (cand_dest, def_insn, cand->insn))
 	return false;
 
   /* We must be able to copy between the two registers.   Generate,
@@ -836,11 +835,8 @@ combine_reaching_defs (ext_cand *cand, const_rtx set_pat, ext_state *state)
 	 is different than in the code to emit the copy as we have not
 	 modified the defining insn yet.  */
   start_sequence ();
-  rtx pat = PATTERN (cand->insn);
-  rtx new_dst = gen_rtx_REG (GET_MODE (SET_DEST (pat)),
- REGNO (get_extended_src_reg (SET_SRC (pat;
-  rtx new_src = gen_rtx_REG (GET_MODE (SET_DEST (pat)),
- REGNO (SET_DEST (pat)));
+  rtx new_dst = gen_rtx_REG (GET_MODE (cand_dest), REGNO (src_reg));
+  rtx new_src = gen_rtx_REG (GET_MODE (cand_dest), REGNO (cand_dest));
   emit_move_insn (new_dst, new_src);
 
   rtx_insn *insn = get_insns();
@@ -854,7 +850,6 @@ combine_reaching_defs (ext_cand *cand, const_rtx set_pat, ext_state *state)
 	return false;
 }
 
-
   /* If cand->insn has been already modified, update cand->mode to a wider
  mode if possible, or punt.  */
   if (state->modified[INSN_UID (cand->insn)].kind != EXT_MODIFIED_NONE)


Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers

2015-11-19 Thread Jeff Law

On 11/19/2015 09:53 AM, Bernd Schmidt wrote:

On 11/19/2015 04:34 PM, Nick Clifton wrote:

Hi Bernd,


I had a look around. There's code testing HARD_REGNO_NREGS in
ree.c:combine_set_extension. It's inside #if 0, and labelled
"temporarily disabled". See if enabling that helps you? (Jeff, that #if
0 was added by you).


I suspect that the code was disabled because it prevented too many
useful optimization opportunities.


Well, correctness goes first. Since reenabling it fixes your problem,
that change is approved (conditional on testing as usual obviously).
I couldn't convince myself it was a correct check.  Having Nick's 
testcase should help with the analysis.


Frankly, the overall structure of ree is a mess -- I've found it 
incredibly difficult to follow every time I've had to look at it.


Let me take a peek today and see if I can get familiar enough with this 
code again to help.



Jeff


Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers

2015-11-19 Thread Bernd Schmidt

On 11/19/2015 04:34 PM, Nick Clifton wrote:

Hi Bernd,


I had a look around. There's code testing HARD_REGNO_NREGS in
ree.c:combine_set_extension. It's inside #if 0, and labelled
"temporarily disabled". See if enabling that helps you? (Jeff, that #if
0 was added by you).


I suspect that the code was disabled because it prevented too many
useful optimization opportunities.


Well, correctness goes first. Since reenabling it fixes your problem, 
that change is approved (conditional on testing as usual obviously).



Bernd


Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers

2015-11-19 Thread Nick Clifton

Hi Bernd,


I had a look around. There's code testing HARD_REGNO_NREGS in
ree.c:combine_set_extension. It's inside #if 0, and labelled
"temporarily disabled". See if enabling that helps you? (Jeff, that #if
0 was added by you).


I suspect that the code was disabled because it prevented too many 
useful optimization opportunities.


The code there would solve this problem, but the approach is is overly 
cautious, since it disables the optimization for all extensions that 
increase the number of hard registers used.  Some of these will be 
viable candidates, provided that the extra hard registers are no used. 
(This is certainly true for the RL78, where the (patched) optimization 
does improve code, even though the widening does use extra registers).


Cheers
  Nick




Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers

2015-11-18 Thread Jeff Law

On 11/18/2015 05:52 PM, Bernd Schmidt wrote:

   (This is on the RL78 target where HImode values occupy two hard
   registers and QImode values only one.  The bug however is generic, not
   RL78 specific).

   The REE pass transforms this into:

 (insn  44 (set (reg:QI r11) (mem:QI (reg:HI r20)))
 (insn  45 (set (reg:HI r10) (zero_extend:HI (mem:QI (reg:HI r18
 [...]
 (insn  71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11)))
 [...]
 (insn  88 deleted)

   Note how the new set at insn 45 clobbers the value loaded by insn 44
   into r11.


I had a look around. There's code testing HARD_REGNO_NREGS in
ree.c:combine_set_extension. It's inside #if 0, and labelled
"temporarily disabled". See if enabling that helps you? (Jeff, that #if
0 was added by you).
I was going to dig into this discussion tomorrow since I was the last 
one in that code.  Just had to get that threading regression resolved 
first :-)


jeff


Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers

2015-11-18 Thread Bernd Schmidt

   (This is on the RL78 target where HImode values occupy two hard
   registers and QImode values only one.  The bug however is generic, not
   RL78 specific).

   The REE pass transforms this into:

 (insn  44 (set (reg:QI r11) (mem:QI (reg:HI r20)))
 (insn  45 (set (reg:HI r10) (zero_extend:HI (mem:QI (reg:HI r18
 [...]
 (insn  71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11)))
 [...]
 (insn  88 deleted)

   Note how the new set at insn 45 clobbers the value loaded by insn 44
   into r11.


I had a look around. There's code testing HARD_REGNO_NREGS in 
ree.c:combine_set_extension. It's inside #if 0, and labelled 
"temporarily disabled". See if enabling that helps you? (Jeff, that #if 
0 was added by you).



Bernd