Re: [Patch] Teach RTL ifcvt to handle multiple simple set instructions

2015-09-12 Thread Eric Botcazou
> Some targets have -mbranch-cost to allow overriding the default costing.
>   visium has a branch cost of 10!

Yeah, the GR5 variant is pipelined but has no branch prediction; moreover 
there is an additional adverse effect coming for the instructions bus...

>   Several ports have a cost of 6 either unconditionally or when the branch
>   is not well predicted.

9 for UltraSPARC3, although this should probably be lowered if predictable_p.

-- 
Eric Botcazou


Re: [Patch] Teach RTL ifcvt to handle multiple simple set instructions

2015-09-11 Thread Jeff Law

On 09/11/2015 02:49 AM, Kyrill Tkachov wrote:


On 10/09/15 22:11, Jeff Law wrote:

On 09/10/2015 12:23 PM, Bernd Schmidt wrote:

  > No testcase provided, as currently I don't know of targets with a
high
  > enough branch cost to actually trigger the optimisation.

Hmm, so the code would not actually be used right now? In that case I'll
leave it to others to decide whether we want to apply it. Other than the
points above it looks OK to me.

Some targets have -mbranch-cost to allow overriding the default costing.
   visium has a branch cost of 10!  Several ports have a cost of 6 either
unconditionally or when the branch is not well predicted.

Presumably James is more interested in the ARM/AArch64 targets ;-)

I think that's probably what James is most interested in getting some
ideas around -- the cost model.

I think the fundamental problem is BRANCH_COST isn't actually relative
to anything other than the default value of "1".  It doesn't directly
correspond to COSTS_N_INSNS or anything else.  So while using
COSTS_N_INSNS (BRANCH_COST)) would seem to make sense, it actually
doesn't.  It's not even clear how a value of 10 relates to a value of 1
other than it's more expensive.

ifcvt (and others) comparing to magic #s is more than a bit lame.  But
with BRANCH_COST having no meaning relative to anything else I can see
why Richard did things that way.


Out of interest, what was the intended original meaning
of branch costs if it was not to be relative to instructions?
I don't think it ever had one.  It's self-relative.  A cost of 2 is 
greater than a cost of 1.  No more, no less IIRC.   Lame?  Yes. 
Short-sighted?  Yes.  Should we try to fix it.  Yes.


If you look at how BRANCH_COST actually gets used, AFAIK it's tested 
only against "magic constants", which are themselves lame, short-sighted 
and need to be fixed.


jeff



Re: [Patch] Teach RTL ifcvt to handle multiple simple set instructions

2015-09-11 Thread James Greenhalgh
On Fri, Sep 11, 2015 at 10:04:12AM +0100, Ramana Radhakrishnan wrote:
> On Fri, Sep 11, 2015 at 10:53:13AM +0200, Bernd Schmidt wrote:
> > On 09/10/2015 11:11 PM, Jeff Law wrote:
> > >I think that's probably what James is most interested in getting some
> > >ideas around -- the cost model.
> > >
> > >I think the fundamental problem is BRANCH_COST isn't actually relative
> > >to anything other than the default value of "1".  It doesn't directly
> > >correspond to COSTS_N_INSNS or anything else.  So while using
> > >COSTS_N_INSNS (BRANCH_COST)) would seem to make sense, it actually
> > >doesn't.  It's not even clear how a value of 10 relates to a value of 1
> > >other than it's more expensive.
> > >
> > >ifcvt (and others) comparing to magic #s is more than a bit lame.  But
> > >with BRANCH_COST having no meaning relative to anything else I can see
> > >why Richard did things that way.
> > >
> > >In an ideal world we'd find some mapping from BRANCH_COST that relates
> > >to CONST_N_INSNS.  I suspect a simple mapping doesn't necessarily exist
> > >and we'll likely regress targets with any simplistic mapping.  But maybe
> > >now is the time to address that fundamental problem and deal with the
> > >fallout.
> > 
> > I think the right approach if we want to fix this is a new
> > branch_cost_ninsns target hook (maybe with arguments
> > taken_percentage, predictability), and gradually move everything to
> > use that instead of BRANCH_COST.
> 
> Perhaps providing backends with the entire if-then-else block along
> with the above mentioned information being if converted may be another
> approach, it allows the backends to analyse what cases are good to
> if-convert as per the ISA or micro-architecture and what aren't.

I'm not sure how much of this is likely to be target-dependent and how
much can just be abstracted to common ifcvt code resuing rtx_costs.

I've been sketching out a rough idea of a more applicable cost model for
RTL ifcvt, taking in to consideration what David mentioned regarding the
talks at cauldron. The question we want to ask is:

Which is preferable between:

  Before:
   (COSTS_N_INSNS cost of the compare+branch insns at the tail of the if BB.
 ??? (possibly) some factor related to BRANCH_COST)
   + weighted cost of then BB.
   + (if needed) weighted cost of else BB.

  After:
   seq_cost the candidate new sequence.
  
The weighted cost of the two BBs should mix in some idea as to the relative
probability that we execute them.

The tough part is figuring out how to (reasonably) factor in branch cost.
The reason that is tough is that BRANCH_COST is used inconsistently. Normally
it is not measured relative to anything, but is compared against magic numbers
for optimizations (each of which are really their own question to be posed
as above).

I don't have a good answer to that, nor a good answer as to what BRANCH_COST
should represent in future. The use in the compiler is sort-of consistent
with a measurement against instruction counts (i.e. a branch cost of 3 means
a branch is equivalent to 3 cheap instructions), but is sometimes just used
as a measure of expensive (a branch cost of >= 2 means that abs should be
expanded using a sequence of bit operations).

I'll look in to how the code in ifcvt starts to look with a modified cost
model and get back to you...

James



Re: [Patch] Teach RTL ifcvt to handle multiple simple set instructions

2015-09-11 Thread Ramana Radhakrishnan
On Fri, Sep 11, 2015 at 10:53:13AM +0200, Bernd Schmidt wrote:
> On 09/10/2015 11:11 PM, Jeff Law wrote:
> >I think that's probably what James is most interested in getting some
> >ideas around -- the cost model.
> >
> >I think the fundamental problem is BRANCH_COST isn't actually relative
> >to anything other than the default value of "1".  It doesn't directly
> >correspond to COSTS_N_INSNS or anything else.  So while using
> >COSTS_N_INSNS (BRANCH_COST)) would seem to make sense, it actually
> >doesn't.  It's not even clear how a value of 10 relates to a value of 1
> >other than it's more expensive.
> >
> >ifcvt (and others) comparing to magic #s is more than a bit lame.  But
> >with BRANCH_COST having no meaning relative to anything else I can see
> >why Richard did things that way.
> >
> >In an ideal world we'd find some mapping from BRANCH_COST that relates
> >to CONST_N_INSNS.  I suspect a simple mapping doesn't necessarily exist
> >and we'll likely regress targets with any simplistic mapping.  But maybe
> >now is the time to address that fundamental problem and deal with the
> >fallout.
> 
> I think the right approach if we want to fix this is a new
> branch_cost_ninsns target hook (maybe with arguments
> taken_percentage, predictability), and gradually move everything to
> use that instead of BRANCH_COST.

Perhaps providing backends with the entire if-then-else block along
with the above mentioned information being if converted may be another
approach, it allows the backends to analyse what cases are good to
if-convert as per the ISA or micro-architecture and what aren't.

regards
Ramana



Re: [Patch] Teach RTL ifcvt to handle multiple simple set instructions

2015-09-11 Thread Bernd Schmidt

On 09/10/2015 11:11 PM, Jeff Law wrote:

I think that's probably what James is most interested in getting some
ideas around -- the cost model.

I think the fundamental problem is BRANCH_COST isn't actually relative
to anything other than the default value of "1".  It doesn't directly
correspond to COSTS_N_INSNS or anything else.  So while using
COSTS_N_INSNS (BRANCH_COST)) would seem to make sense, it actually
doesn't.  It's not even clear how a value of 10 relates to a value of 1
other than it's more expensive.

ifcvt (and others) comparing to magic #s is more than a bit lame.  But
with BRANCH_COST having no meaning relative to anything else I can see
why Richard did things that way.

In an ideal world we'd find some mapping from BRANCH_COST that relates
to CONST_N_INSNS.  I suspect a simple mapping doesn't necessarily exist
and we'll likely regress targets with any simplistic mapping.  But maybe
now is the time to address that fundamental problem and deal with the
fallout.


I think the right approach if we want to fix this is a new 
branch_cost_ninsns target hook (maybe with arguments taken_percentage, 
predictability), and gradually move everything to use that instead of 
BRANCH_COST.



Bernd




Re: [Patch] Teach RTL ifcvt to handle multiple simple set instructions

2015-09-11 Thread Kyrill Tkachov


On 10/09/15 22:11, Jeff Law wrote:

On 09/10/2015 12:23 PM, Bernd Schmidt wrote:

  > No testcase provided, as currently I don't know of targets with a high
  > enough branch cost to actually trigger the optimisation.

Hmm, so the code would not actually be used right now? In that case I'll
leave it to others to decide whether we want to apply it. Other than the
points above it looks OK to me.

Some targets have -mbranch-cost to allow overriding the default costing.
   visium has a branch cost of 10!  Several ports have a cost of 6 either
unconditionally or when the branch is not well predicted.

Presumably James is more interested in the ARM/AArch64 targets ;-)

I think that's probably what James is most interested in getting some
ideas around -- the cost model.

I think the fundamental problem is BRANCH_COST isn't actually relative
to anything other than the default value of "1".  It doesn't directly
correspond to COSTS_N_INSNS or anything else.  So while using
COSTS_N_INSNS (BRANCH_COST)) would seem to make sense, it actually
doesn't.  It's not even clear how a value of 10 relates to a value of 1
other than it's more expensive.

ifcvt (and others) comparing to magic #s is more than a bit lame.  But
with BRANCH_COST having no meaning relative to anything else I can see
why Richard did things that way.


Out of interest, what was the intended original meaning
of branch costs if it was not to be relative to instructions?

Thanks,
Kyrill


In an ideal world we'd find some mapping from BRANCH_COST that relates
to CONST_N_INSNS.  I suspect a simple mapping doesn't necessarily exist
and we'll likely regress targets with any simplistic mapping.  But maybe
now is the time to address that fundamental problem and deal with the
fallout.

jeff






Re: [Patch] Teach RTL ifcvt to handle multiple simple set instructions

2015-09-10 Thread Jeff Law

On 09/10/2015 12:23 PM, Bernd Schmidt wrote:


 > No testcase provided, as currently I don't know of targets with a high
 > enough branch cost to actually trigger the optimisation.

Hmm, so the code would not actually be used right now? In that case I'll
leave it to others to decide whether we want to apply it. Other than the
points above it looks OK to me.
Some targets have -mbranch-cost to allow overriding the default costing. 
 visium has a branch cost of 10!  Several ports have a cost of 6 either 
unconditionally or when the branch is not well predicted.


Presumably James is more interested in the ARM/AArch64 targets ;-)

I think that's probably what James is most interested in getting some 
ideas around -- the cost model.


I think the fundamental problem is BRANCH_COST isn't actually relative 
to anything other than the default value of "1".  It doesn't directly 
correspond to COSTS_N_INSNS or anything else.  So while using 
COSTS_N_INSNS (BRANCH_COST)) would seem to make sense, it actually 
doesn't.  It's not even clear how a value of 10 relates to a value of 1 
other than it's more expensive.


ifcvt (and others) comparing to magic #s is more than a bit lame.  But 
with BRANCH_COST having no meaning relative to anything else I can see 
why Richard did things that way.


In an ideal world we'd find some mapping from BRANCH_COST that relates 
to CONST_N_INSNS.  I suspect a simple mapping doesn't necessarily exist 
and we'll likely regress targets with any simplistic mapping.  But maybe 
now is the time to address that fundamental problem and deal with the 
fallout.


jeff




Re: [Patch] Teach RTL ifcvt to handle multiple simple set instructions

2015-09-10 Thread Bernd Schmidt

On 09/08/2015 04:53 PM, James Greenhalgh wrote:

One big question I have with this patch is how I ought to write a meaningful
cost model I've used. It seems like yet another misuse of RTX costs, and
another bit of stuff for targets to carefully balance. Now, if the
relative cost of branches and conditional move instructions is not
carefully managed, you may enable or disable these optimisations. This is
probably acceptable, but I dislike adding more and more gotcha's to
target costs, as I get bitten by them hard enough as is!


The code you have seems reasonable, except that for compile time it 
might make sense to not even attempt the optimization if the number of 
sets is too large. I'm not too worried about that, but maybe you could 
bail out early if your cost estimate goes too much above the branch cost.



+  /* If we were supposed to read from an earlier write in this block,
+we've changed the register allocation.  Rewire the read.  While
+we are looking, also try to catch a swap idiom.  */


So this is one interesting case; do you also have to worry about others 
(such as maybe setting the same register multiple times)?



+  /* We must have seen some sort of insn to insert, otherwise we were
+ given an empty BB to convert, and we can't handle that.  */
+  if (unmodified_insns.is_empty ())
+{
+  end_sequence ();
+  return FALSE;
+}


Looks like some of the error conditions are tested twice across the two 
new functions? I think it would be better to get rid of one copy or turn 
the second one into a gcc_assert.


> No testcase provided, as currently I don't know of targets with a high
> enough branch cost to actually trigger the optimisation.

Hmm, so the code would not actually be used right now? In that case I'll 
leave it to others to decide whether we want to apply it. Other than the 
points above it looks OK to me.



Bernd



[Patch] Teach RTL ifcvt to handle multiple simple set instructions

2015-09-08 Thread James Greenhalgh

Hi,

RTL "noce" ifcvt will currently give up if the branches it is trying to
make conditional are too complicated. One of the conditions for "too
complicated" is that the branch sets more than one value.

One common idiom that this misses is something like:

  int d = a[i];
  int e = b[i];
  if (d > e)
std::swap (d, e)
  [...]

Which is currently going to generate something like

  compare (d, e)
  branch.le L1
tmp = d;
d = e;
e = tmp;
  L1:

In the case that this is an unpredictable branch, we can do better
with:

  compare (d, e)
  d1 = if_then_else (le, e, d)
  e1 = if_then_else (le, d, e)
  d = d1
  e = e1

Register allocation will eliminate the two trailing unconditional
assignments, and we get a neater sequence.

This patch introduces this logic to the RTL if convert passes, catching
cases where a basic block does nothing other than multiple SETs. This
helps both with the std::swap idiom above, and with pathological cases
where tree passes create new basic blocks to resolve Phi nodes, which
contain only set instructions and end up unprecdictable.

One big question I have with this patch is how I ought to write a meaningful
cost model I've used. It seems like yet another misuse of RTX costs, and
another bit of stuff for targets to carefully balance. Now, if the
relative cost of branches and conditional move instructions is not
carefully managed, you may enable or disable these optimisations. This is
probably acceptable, but I dislike adding more and more gotcha's to
target costs, as I get bitten by them hard enough as is!

Elsewhere the ifcvt cost usage is pretty lacking - esentially counting
the number of instructions which will be if-converted and comparing that
against the magic number "2". I could follow this lead and just count
the number of moves I would convert, then compare that to the branch cost,
but this feels... wrong. This makes it pretty tough to choose a "good"
number for TARGET_BRANCH_COST. This isn't helped now that higher branch
costs can mean pulling expensive instructions in to the main execution
stream.

I've picked a fairly straightforward cost model for this patch, trying to
compare the cost of each conditional move, as calculated with rtx_costs,
against COSTS_N_INSNS (branch_cost). This essentially kills the
optimisation for any target with conditional-move cost > 1. Personally, I
consider that a pretty horrible bug in this patch - but I couldn't think of
anything better to try.

As you might expect, this triggers all over the place when
TARGET_BRANCH_COST numbers are tuned high. In an AArch64 Spec2006 build,
I saw 3.9% more CSEL operations with this patch and TARGET_BRANCH_COST set
to 4. Performance is also good on AArch64 on a range of microbenchmarks
and larger workloads (after playing with the branch costs). I didn't see
any performance regression on x86_64, as you would expect given that the
cost models preclude x86_64 targets from ever hitting this optimisation.

Bootstrapped and tested on x86_64 and AArch64 with no issues, and
bootstrapped and tested with the cost model turned off, to have some
confidence that we will continue to do the right thing if any targets do
up their branch costs and start using this code.

No testcase provided, as currently I don't know of targets with a high
enough branch cost to actually trigger the optimisation.

OK?

Thanks,
James

---
gcc/

2015-09-07  James Greenhalgh  

* ifcvt.c (bb_ok_for_noce_convert_multiple_sets): New.
(noce_convert_multiple_sets): Likewise.
(noce_process_if_block): Call them.

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 157a716..059bd89 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -2982,6 +2982,223 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond,
   return false;
 }
 
+/* We have something like:
+
+ if (x > y)
+   { i = a; j = b; k = c; }
+
+   Make it:
+
+ tmp_i = (x > y) ? a : i;
+ tmp_j = (x > y) ? b : j;
+ tmp_k = (x > y) ? c : k;
+ i = tmp_i; <- Should be cleaned up
+ j = tmp_j; <- Likewise.
+ k = tmp_k; <- Likewise.
+
+   Look for special cases such as use of temporary registers (for
+   example in a swap idiom).
+
+   IF_INFO contains the useful information about the block structure and
+   jump instructions.  */
+
+static int
+noce_convert_multiple_sets (struct noce_if_info *if_info)
+{
+  basic_block test_bb = if_info->test_bb;
+  basic_block then_bb = if_info->then_bb;
+  basic_block join_bb = if_info->join_bb;
+  rtx_insn *jump = if_info->jump;
+  rtx_insn *cond_earliest;
+  unsigned int cost = 0;
+  rtx_insn *insn;
+
+  start_sequence ();
+
+  /* Decompose the condition attached to the jump.  */
+  rtx cond = noce_get_condition (jump, &cond_earliest, false);
+  rtx x = XEXP (cond, 0);
+  rtx y = XEXP (cond, 1);
+  rtx_code cond_code = GET_CODE (cond);
+
+  /* The true targets for a conditional move.  */
+  vec targets = vNULL;
+  /* The temporaries introduced to allow us to not consider register
+ overlap.  */
+