Re: [Patch ifcvt] Teach RTL ifcvt to handle multiple simple set instructions

2015-11-06 Thread Christophe Lyon
On 4 November 2015 at 16:37, James Greenhalgh  wrote:
>
> On Wed, Nov 04, 2015 at 12:04:19PM +0100, Bernd Schmidt wrote:
>> On 10/30/2015 07:03 PM, James Greenhalgh wrote:
>> >+ i = tmp_i; <- Should be cleaned up
>>
>> Maybe reword as "Subsequent passes are expected to clean up the
>> extra moves", otherwise it sounds like a TODO item.
>>
>> >+   read back in anotyher SET, as might occur in a swap idiom or
>>
>> Typo.
>>
>> >+  if (find_reg_note (insn, REG_DEAD, new_val) != NULL_RTX)
>> >+{
>> >+  /* The write to targets[i] is only live until the read
>> >+ here.  As the condition codes match, we can propagate
>> >+ the set to here.  */
>> >+   new_val = SET_SRC (single_set (unmodified_insns[i]));
>> >+}
>>
>> Shouldn't use braces around single statements (also goes for the
>> surrounding for loop).
>>
>> >+  /* We must have at least one real insn to convert, or there will
>> >+ be trouble!  */
>> >+  unsigned count = 0;
>>
>> The comment seems a bit strange in this context - I think it's left
>> over from the earlier version?
>>
>> As far as I'm concerned this is otherwise ok.
>
> Thanks,
>
> I've updated the patch with those issues addressed. As the cost model was
> controversial in an earlier revision, I'll leave this on list for 24 hours
> and, if nobody jumps in to object, commit it tomorrow.
>
> I've bootstrapped and tested the updated patch on x86_64-none-linux-gnu
> just to check that I got the braces right, with no issues.
>

The new test does not pass on some ARM configurations, I filed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68232

Christophe.

> Thanks,
> James
>
> ---
> gcc/
>
> 2015-11-04  James Greenhalgh  
>
> * ifcvt.c (bb_ok_for_noce_convert_multiple_sets): New.
> (noce_convert_multiple_sets): Likewise.
> (noce_process_if_block): Call them.
>
> gcc/testsuite/
>
> 2015-11-04  James Greenhalgh  
>
> * gcc.dg/ifcvt-4.c: New.
>


Re: [Patch ifcvt] Teach RTL ifcvt to handle multiple simple set instructions

2015-11-04 Thread James Greenhalgh

On Wed, Nov 04, 2015 at 12:04:19PM +0100, Bernd Schmidt wrote:
> On 10/30/2015 07:03 PM, James Greenhalgh wrote:
> >+ i = tmp_i; <- Should be cleaned up
>
> Maybe reword as "Subsequent passes are expected to clean up the
> extra moves", otherwise it sounds like a TODO item.
>
> >+   read back in anotyher SET, as might occur in a swap idiom or
>
> Typo.
>
> >+  if (find_reg_note (insn, REG_DEAD, new_val) != NULL_RTX)
> >+{
> >+  /* The write to targets[i] is only live until the read
> >+ here.  As the condition codes match, we can propagate
> >+ the set to here.  */
> >+   new_val = SET_SRC (single_set (unmodified_insns[i]));
> >+}
>
> Shouldn't use braces around single statements (also goes for the
> surrounding for loop).
>
> >+  /* We must have at least one real insn to convert, or there will
> >+ be trouble!  */
> >+  unsigned count = 0;
>
> The comment seems a bit strange in this context - I think it's left
> over from the earlier version?
>
> As far as I'm concerned this is otherwise ok.

Thanks,

I've updated the patch with those issues addressed. As the cost model was
controversial in an earlier revision, I'll leave this on list for 24 hours
and, if nobody jumps in to object, commit it tomorrow.

I've bootstrapped and tested the updated patch on x86_64-none-linux-gnu
just to check that I got the braces right, with no issues.

Thanks,
James

---
gcc/

2015-11-04  James Greenhalgh  

* ifcvt.c (bb_ok_for_noce_convert_multiple_sets): New.
(noce_convert_multiple_sets): Likewise.
(noce_process_if_block): Call them.

gcc/testsuite/

2015-11-04  James Greenhalgh  

* gcc.dg/ifcvt-4.c: New.

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index f23d9afd..1c33283 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -3016,6 +3016,244 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond,
   return false;
 }
 
+/* We have something like:
+
+ if (x > y)
+   { i = a; j = b; k = c; }
+
+   Make it:
+
+ tmp_i = (x > y) ? a : i;
+ tmp_j = (x > y) ? b : j;
+ tmp_k = (x > y) ? c : k;
+ i = tmp_i;
+ j = tmp_j;
+ k = tmp_k;
+
+   Subsequent passes are expected to clean up the extra moves.
+
+   Look for special cases such as writes to one register which are
+   read back in another SET, as might occur in a swap idiom or
+   similar.
+
+   These look like:
+
+   if (x > y)
+ i = a;
+ j = i;
+
+   Which we want to rewrite to:
+
+ tmp_i = (x > y) ? a : i;
+ tmp_j = (x > y) ? tmp_i : j;
+ i = tmp_i;
+ j = tmp_j;
+
+   We can catch these when looking at (SET x y) by keeping a list of the
+   registers we would have targeted before if-conversion and looking back
+   through it for an overlap with Y.  If we find one, we rewire the
+   conditional set to use the temporary we introduced earlier.
+
+   IF_INFO contains the useful information about the block structure and
+   jump instructions.  */
+
+static int
+noce_convert_multiple_sets (struct noce_if_info *if_info)
+{
+  basic_block test_bb = if_info->test_bb;
+  basic_block then_bb = if_info->then_bb;
+  basic_block join_bb = if_info->join_bb;
+  rtx_insn *jump = if_info->jump;
+  rtx_insn *cond_earliest;
+  rtx_insn *insn;
+
+  start_sequence ();
+
+  /* Decompose the condition attached to the jump.  */
+  rtx cond = noce_get_condition (jump, &cond_earliest, false);
+  rtx x = XEXP (cond, 0);
+  rtx y = XEXP (cond, 1);
+  rtx_code cond_code = GET_CODE (cond);
+
+  /* The true targets for a conditional move.  */
+  vec targets = vNULL;
+  /* The temporaries introduced to allow us to not consider register
+ overlap.  */
+  vec temporaries = vNULL;
+  /* The insns we've emitted.  */
+  vec unmodified_insns = vNULL;
+  int count = 0;
+
+  FOR_BB_INSNS (then_bb, insn)
+{
+  /* Skip over non-insns.  */
+  if (!active_insn_p (insn))
+	continue;
+
+  rtx set = single_set (insn);
+  gcc_checking_assert (set);
+
+  rtx target = SET_DEST (set);
+  rtx temp = gen_reg_rtx (GET_MODE (target));
+  rtx new_val = SET_SRC (set);
+  rtx old_val = target;
+
+  /* If we were supposed to read from an earlier write in this block,
+	 we've changed the register allocation.  Rewire the read.  While
+	 we are looking, also try to catch a swap idiom.  */
+  for (int i = count - 1; i >= 0; --i)
+	if (reg_overlap_mentioned_p (new_val, targets[i]))
+	  {
+	/* Catch a "swap" style idiom.  */
+	if (find_reg_note (insn, REG_DEAD, new_val) != NULL_RTX)
+	  /* The write to targets[i] is only live until the read
+		 here.  As the condition codes match, we can propagate
+		 the set to here.  */
+	  new_val = SET_SRC (single_set (unmodified_insns[i]));
+	else
+	  new_val = temporaries[i];
+	break;
+	  }
+
+  /* If we had a non-canonical conditional jump (i.e. one where
+	 the fallthrough is to the "else" case) we need to reverse
+	 the conditional select.  */

Re: [Patch ifcvt] Teach RTL ifcvt to handle multiple simple set instructions

2015-11-04 Thread Bernd Schmidt

On 10/30/2015 07:03 PM, James Greenhalgh wrote:

+ i = tmp_i; <- Should be cleaned up


Maybe reword as "Subsequent passes are expected to clean up the extra 
moves", otherwise it sounds like a TODO item.



+   read back in anotyher SET, as might occur in a swap idiom or


Typo.


+ if (find_reg_note (insn, REG_DEAD, new_val) != NULL_RTX)
+   {
+ /* The write to targets[i] is only live until the read
+here.  As the condition codes match, we can propagate
+the set to here.  */
+  new_val = SET_SRC (single_set (unmodified_insns[i]));
+   }


Shouldn't use braces around single statements (also goes for the 
surrounding for loop).



+  /* We must have at least one real insn to convert, or there will
+ be trouble!  */
+  unsigned count = 0;


The comment seems a bit strange in this context - I think it's left over 
from the earlier version?


As far as I'm concerned this is otherwise ok.


Bernd


Re: [Patch ifcvt] Teach RTL ifcvt to handle multiple simple set instructions

2015-10-30 Thread James Greenhalgh

On Thu, Sep 10, 2015 at 07:23:28PM +0100, Bernd Schmidt wrote:
> On 09/08/2015 04:53 PM, James Greenhalgh wrote:
> > One big question I have with this patch is how I ought to write a meaningful
> > cost model I've used. It seems like yet another misuse of RTX costs, and
> > another bit of stuff for targets to carefully balance. Now, if the
> > relative cost of branches and conditional move instructions is not
> > carefully managed, you may enable or disable these optimisations. This is
> > probably acceptable, but I dislike adding more and more gotcha's to
> > target costs, as I get bitten by them hard enough as is!
>
> The code you have seems reasonable, except that for compile time it
> might make sense to not even attempt the optimization if the number of
> sets is too large. I'm not too worried about that, but maybe you could
> bail out early if your cost estimate goes too much above the branch cost.

Thanks all for the review of this patch. I was hoping to look at fixing the
costs model first, but that seems likely to continue to be controversial
and worthy of a more considerable period to work through kinks in the
design rather than rushed in to the closing days of Stage 1. I'll get back
to the list soon with what I'd like to achieve with a costs rework and
how I plan to go about it.

In the interim, I'd like to try to find a route to get this patch accepted
and committed.

The goal after I rewrite the costs should be for us to use whatever
new replacement_cost (or similar) hook we end up building. That will
necessitate building and optimising some RTX which we are going to throw
away, I think that is the proffered direction (though still needs
discussed) and compile time will be a necessary sacrifice I'm afraid.

In the FORNOW cost model, we just count instructions. We can do that
early and avoid creating the extra RTX. That's what I do here.

> > +  /* If we were supposed to read from an earlier write in this block,
> > +we've changed the register allocation.  Rewire the read.  While
> > +we are looking, also try to catch a swap idiom.  */
>
> So this is one interesting case; do you also have to worry about others
> (such as maybe setting the same register multiple times)?

I'm hoping that cases like that will have either been cleaned up already
or will be cleaned up later. For the particular example you mention, we
would have something like:

if (a > b)
  x = 1;
  y = 2;
  x = x;
  y = x;

And would if-convert (catching with the existing swap idiom check) to:

  tmp_x = (a > b) ? 1 : x;
  tmp_y = (a > b) ? 2 : y;
  tmp_x2 = (a > b) ? 1 : x;
  tmp_y2 = (a > b) ? 1 : y;
  x = tmp_x
  y = tmp_y
  x = tmp_x2
  y = tmp_y2

Which, while ugly, one would hope would be cleaned to:

  tmp_x2 = (a > b) ? 1 : x;
  tmp_y2 = (a > b) ? 1 : y;
  x = tmp_x2
  y = tmp_y2

I've expanded the comment to talk about this case.

> > +  /* We must have seen some sort of insn to insert, otherwise we were
> > + given an empty BB to convert, and we can't handle that.  */
> > +  if (unmodified_insns.is_empty ())
> > +{
> > +  end_sequence ();
> > +  return FALSE;
> > +}
>
> Looks like some of the error conditions are tested twice across the two
> new functions? I think it would be better to get rid of one copy or turn
> the second one into a gcc_assert.

I've cleaned this up a little, thanks.

>
>  > No testcase provided, as currently I don't know of targets with a high
>  > enough branch cost to actually trigger the optimisation.
>
> Hmm, so the code would not actually be used right now? In that case I'll
> leave it to others to decide whether we want to apply it. Other than the
> points above it looks OK to me.

We have more than a few targets with BRANCH_COST >= 2 for unpredictable
branches, so I've put together a testcase which exercises at least some of
the code. I'll need to rely on target maintainers to XFAIL their target
(or at least to shout at me if a FAIL pops up, and I can XFAIL their
target).

I've boosttrapped this revision on aarch64-none-linux-gnu and
x86_64-none-linux-gnu with no issues.

By inspection of the x86_64 and aarch64 generated code for Spec2006 we
if-convert in a few additional places (though within the defined cost
model) and we end up with inverse condition codes for the cmoves we put
out. Neither of these looked bad to me.

Is this OK to progress as is while I build myself up to wrestle with
costs for GCC 7?

Thanks,
James

---
gcc/

2015-10-30  James Greenhalgh  

* ifcvt.c (bb_ok_for_noce_convert_multiple_sets): New.
(noce_convert_multiple_sets): Likewise.
(noce_process_if_block): Call them.

gcc/testsuite/

2015-10-30  James Greenhalgh  

* gcc.dg/ifcvt-4.c: New.

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 592e86d..4071490 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -3025,6 +3025,249 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond,
   return false;
 }
 
+/* We have something like:
+
+ if