Re: [Patch] Teach RTL ifcvt to handle multiple simple set instructions
> Some targets have -mbranch-cost to allow overriding the default costing. > visium has a branch cost of 10! Yeah, the GR5 variant is pipelined but has no branch prediction; moreover there is an additional adverse effect coming for the instructions bus... > Several ports have a cost of 6 either unconditionally or when the branch > is not well predicted. 9 for UltraSPARC3, although this should probably be lowered if predictable_p. -- Eric Botcazou
Re: [Patch] Teach RTL ifcvt to handle multiple simple set instructions
On 09/11/2015 02:49 AM, Kyrill Tkachov wrote: On 10/09/15 22:11, Jeff Law wrote: On 09/10/2015 12:23 PM, Bernd Schmidt wrote: > No testcase provided, as currently I don't know of targets with a high > enough branch cost to actually trigger the optimisation. Hmm, so the code would not actually be used right now? In that case I'll leave it to others to decide whether we want to apply it. Other than the points above it looks OK to me. Some targets have -mbranch-cost to allow overriding the default costing. visium has a branch cost of 10! Several ports have a cost of 6 either unconditionally or when the branch is not well predicted. Presumably James is more interested in the ARM/AArch64 targets ;-) I think that's probably what James is most interested in getting some ideas around -- the cost model. I think the fundamental problem is BRANCH_COST isn't actually relative to anything other than the default value of "1". It doesn't directly correspond to COSTS_N_INSNS or anything else. So while using COSTS_N_INSNS (BRANCH_COST)) would seem to make sense, it actually doesn't. It's not even clear how a value of 10 relates to a value of 1 other than it's more expensive. ifcvt (and others) comparing to magic #s is more than a bit lame. But with BRANCH_COST having no meaning relative to anything else I can see why Richard did things that way. Out of interest, what was the intended original meaning of branch costs if it was not to be relative to instructions? I don't think it ever had one. It's self-relative. A cost of 2 is greater than a cost of 1. No more, no less IIRC. Lame? Yes. Short-sighted? Yes. Should we try to fix it. Yes. If you look at how BRANCH_COST actually gets used, AFAIK it's tested only against "magic constants", which are themselves lame, short-sighted and need to be fixed. jeff
Re: [Patch] Teach RTL ifcvt to handle multiple simple set instructions
On Fri, Sep 11, 2015 at 10:04:12AM +0100, Ramana Radhakrishnan wrote: > On Fri, Sep 11, 2015 at 10:53:13AM +0200, Bernd Schmidt wrote: > > On 09/10/2015 11:11 PM, Jeff Law wrote: > > >I think that's probably what James is most interested in getting some > > >ideas around -- the cost model. > > > > > >I think the fundamental problem is BRANCH_COST isn't actually relative > > >to anything other than the default value of "1". It doesn't directly > > >correspond to COSTS_N_INSNS or anything else. So while using > > >COSTS_N_INSNS (BRANCH_COST)) would seem to make sense, it actually > > >doesn't. It's not even clear how a value of 10 relates to a value of 1 > > >other than it's more expensive. > > > > > >ifcvt (and others) comparing to magic #s is more than a bit lame. But > > >with BRANCH_COST having no meaning relative to anything else I can see > > >why Richard did things that way. > > > > > >In an ideal world we'd find some mapping from BRANCH_COST that relates > > >to CONST_N_INSNS. I suspect a simple mapping doesn't necessarily exist > > >and we'll likely regress targets with any simplistic mapping. But maybe > > >now is the time to address that fundamental problem and deal with the > > >fallout. > > > > I think the right approach if we want to fix this is a new > > branch_cost_ninsns target hook (maybe with arguments > > taken_percentage, predictability), and gradually move everything to > > use that instead of BRANCH_COST. > > Perhaps providing backends with the entire if-then-else block along > with the above mentioned information being if converted may be another > approach, it allows the backends to analyse what cases are good to > if-convert as per the ISA or micro-architecture and what aren't. I'm not sure how much of this is likely to be target-dependent and how much can just be abstracted to common ifcvt code resuing rtx_costs. I've been sketching out a rough idea of a more applicable cost model for RTL ifcvt, taking in to consideration what David mentioned regarding the talks at cauldron. The question we want to ask is: Which is preferable between: Before: (COSTS_N_INSNS cost of the compare+branch insns at the tail of the if BB. ??? (possibly) some factor related to BRANCH_COST) + weighted cost of then BB. + (if needed) weighted cost of else BB. After: seq_cost the candidate new sequence. The weighted cost of the two BBs should mix in some idea as to the relative probability that we execute them. The tough part is figuring out how to (reasonably) factor in branch cost. The reason that is tough is that BRANCH_COST is used inconsistently. Normally it is not measured relative to anything, but is compared against magic numbers for optimizations (each of which are really their own question to be posed as above). I don't have a good answer to that, nor a good answer as to what BRANCH_COST should represent in future. The use in the compiler is sort-of consistent with a measurement against instruction counts (i.e. a branch cost of 3 means a branch is equivalent to 3 cheap instructions), but is sometimes just used as a measure of expensive (a branch cost of >= 2 means that abs should be expanded using a sequence of bit operations). I'll look in to how the code in ifcvt starts to look with a modified cost model and get back to you... James
Re: [Patch] Teach RTL ifcvt to handle multiple simple set instructions
On Fri, Sep 11, 2015 at 10:53:13AM +0200, Bernd Schmidt wrote: > On 09/10/2015 11:11 PM, Jeff Law wrote: > >I think that's probably what James is most interested in getting some > >ideas around -- the cost model. > > > >I think the fundamental problem is BRANCH_COST isn't actually relative > >to anything other than the default value of "1". It doesn't directly > >correspond to COSTS_N_INSNS or anything else. So while using > >COSTS_N_INSNS (BRANCH_COST)) would seem to make sense, it actually > >doesn't. It's not even clear how a value of 10 relates to a value of 1 > >other than it's more expensive. > > > >ifcvt (and others) comparing to magic #s is more than a bit lame. But > >with BRANCH_COST having no meaning relative to anything else I can see > >why Richard did things that way. > > > >In an ideal world we'd find some mapping from BRANCH_COST that relates > >to CONST_N_INSNS. I suspect a simple mapping doesn't necessarily exist > >and we'll likely regress targets with any simplistic mapping. But maybe > >now is the time to address that fundamental problem and deal with the > >fallout. > > I think the right approach if we want to fix this is a new > branch_cost_ninsns target hook (maybe with arguments > taken_percentage, predictability), and gradually move everything to > use that instead of BRANCH_COST. Perhaps providing backends with the entire if-then-else block along with the above mentioned information being if converted may be another approach, it allows the backends to analyse what cases are good to if-convert as per the ISA or micro-architecture and what aren't. regards Ramana
Re: [Patch] Teach RTL ifcvt to handle multiple simple set instructions
On 09/10/2015 11:11 PM, Jeff Law wrote: I think that's probably what James is most interested in getting some ideas around -- the cost model. I think the fundamental problem is BRANCH_COST isn't actually relative to anything other than the default value of "1". It doesn't directly correspond to COSTS_N_INSNS or anything else. So while using COSTS_N_INSNS (BRANCH_COST)) would seem to make sense, it actually doesn't. It's not even clear how a value of 10 relates to a value of 1 other than it's more expensive. ifcvt (and others) comparing to magic #s is more than a bit lame. But with BRANCH_COST having no meaning relative to anything else I can see why Richard did things that way. In an ideal world we'd find some mapping from BRANCH_COST that relates to CONST_N_INSNS. I suspect a simple mapping doesn't necessarily exist and we'll likely regress targets with any simplistic mapping. But maybe now is the time to address that fundamental problem and deal with the fallout. I think the right approach if we want to fix this is a new branch_cost_ninsns target hook (maybe with arguments taken_percentage, predictability), and gradually move everything to use that instead of BRANCH_COST. Bernd
Re: [Patch] Teach RTL ifcvt to handle multiple simple set instructions
On 10/09/15 22:11, Jeff Law wrote: On 09/10/2015 12:23 PM, Bernd Schmidt wrote: > No testcase provided, as currently I don't know of targets with a high > enough branch cost to actually trigger the optimisation. Hmm, so the code would not actually be used right now? In that case I'll leave it to others to decide whether we want to apply it. Other than the points above it looks OK to me. Some targets have -mbranch-cost to allow overriding the default costing. visium has a branch cost of 10! Several ports have a cost of 6 either unconditionally or when the branch is not well predicted. Presumably James is more interested in the ARM/AArch64 targets ;-) I think that's probably what James is most interested in getting some ideas around -- the cost model. I think the fundamental problem is BRANCH_COST isn't actually relative to anything other than the default value of "1". It doesn't directly correspond to COSTS_N_INSNS or anything else. So while using COSTS_N_INSNS (BRANCH_COST)) would seem to make sense, it actually doesn't. It's not even clear how a value of 10 relates to a value of 1 other than it's more expensive. ifcvt (and others) comparing to magic #s is more than a bit lame. But with BRANCH_COST having no meaning relative to anything else I can see why Richard did things that way. Out of interest, what was the intended original meaning of branch costs if it was not to be relative to instructions? Thanks, Kyrill In an ideal world we'd find some mapping from BRANCH_COST that relates to CONST_N_INSNS. I suspect a simple mapping doesn't necessarily exist and we'll likely regress targets with any simplistic mapping. But maybe now is the time to address that fundamental problem and deal with the fallout. jeff
Re: [Patch] Teach RTL ifcvt to handle multiple simple set instructions
On 09/10/2015 12:23 PM, Bernd Schmidt wrote: > No testcase provided, as currently I don't know of targets with a high > enough branch cost to actually trigger the optimisation. Hmm, so the code would not actually be used right now? In that case I'll leave it to others to decide whether we want to apply it. Other than the points above it looks OK to me. Some targets have -mbranch-cost to allow overriding the default costing. visium has a branch cost of 10! Several ports have a cost of 6 either unconditionally or when the branch is not well predicted. Presumably James is more interested in the ARM/AArch64 targets ;-) I think that's probably what James is most interested in getting some ideas around -- the cost model. I think the fundamental problem is BRANCH_COST isn't actually relative to anything other than the default value of "1". It doesn't directly correspond to COSTS_N_INSNS or anything else. So while using COSTS_N_INSNS (BRANCH_COST)) would seem to make sense, it actually doesn't. It's not even clear how a value of 10 relates to a value of 1 other than it's more expensive. ifcvt (and others) comparing to magic #s is more than a bit lame. But with BRANCH_COST having no meaning relative to anything else I can see why Richard did things that way. In an ideal world we'd find some mapping from BRANCH_COST that relates to CONST_N_INSNS. I suspect a simple mapping doesn't necessarily exist and we'll likely regress targets with any simplistic mapping. But maybe now is the time to address that fundamental problem and deal with the fallout. jeff
Re: [Patch] Teach RTL ifcvt to handle multiple simple set instructions
On 09/08/2015 04:53 PM, James Greenhalgh wrote: One big question I have with this patch is how I ought to write a meaningful cost model I've used. It seems like yet another misuse of RTX costs, and another bit of stuff for targets to carefully balance. Now, if the relative cost of branches and conditional move instructions is not carefully managed, you may enable or disable these optimisations. This is probably acceptable, but I dislike adding more and more gotcha's to target costs, as I get bitten by them hard enough as is! The code you have seems reasonable, except that for compile time it might make sense to not even attempt the optimization if the number of sets is too large. I'm not too worried about that, but maybe you could bail out early if your cost estimate goes too much above the branch cost. + /* If we were supposed to read from an earlier write in this block, +we've changed the register allocation. Rewire the read. While +we are looking, also try to catch a swap idiom. */ So this is one interesting case; do you also have to worry about others (such as maybe setting the same register multiple times)? + /* We must have seen some sort of insn to insert, otherwise we were + given an empty BB to convert, and we can't handle that. */ + if (unmodified_insns.is_empty ()) +{ + end_sequence (); + return FALSE; +} Looks like some of the error conditions are tested twice across the two new functions? I think it would be better to get rid of one copy or turn the second one into a gcc_assert. > No testcase provided, as currently I don't know of targets with a high > enough branch cost to actually trigger the optimisation. Hmm, so the code would not actually be used right now? In that case I'll leave it to others to decide whether we want to apply it. Other than the points above it looks OK to me. Bernd
[Patch] Teach RTL ifcvt to handle multiple simple set instructions
Hi, RTL "noce" ifcvt will currently give up if the branches it is trying to make conditional are too complicated. One of the conditions for "too complicated" is that the branch sets more than one value. One common idiom that this misses is something like: int d = a[i]; int e = b[i]; if (d > e) std::swap (d, e) [...] Which is currently going to generate something like compare (d, e) branch.le L1 tmp = d; d = e; e = tmp; L1: In the case that this is an unpredictable branch, we can do better with: compare (d, e) d1 = if_then_else (le, e, d) e1 = if_then_else (le, d, e) d = d1 e = e1 Register allocation will eliminate the two trailing unconditional assignments, and we get a neater sequence. This patch introduces this logic to the RTL if convert passes, catching cases where a basic block does nothing other than multiple SETs. This helps both with the std::swap idiom above, and with pathological cases where tree passes create new basic blocks to resolve Phi nodes, which contain only set instructions and end up unprecdictable. One big question I have with this patch is how I ought to write a meaningful cost model I've used. It seems like yet another misuse of RTX costs, and another bit of stuff for targets to carefully balance. Now, if the relative cost of branches and conditional move instructions is not carefully managed, you may enable or disable these optimisations. This is probably acceptable, but I dislike adding more and more gotcha's to target costs, as I get bitten by them hard enough as is! Elsewhere the ifcvt cost usage is pretty lacking - esentially counting the number of instructions which will be if-converted and comparing that against the magic number "2". I could follow this lead and just count the number of moves I would convert, then compare that to the branch cost, but this feels... wrong. This makes it pretty tough to choose a "good" number for TARGET_BRANCH_COST. This isn't helped now that higher branch costs can mean pulling expensive instructions in to the main execution stream. I've picked a fairly straightforward cost model for this patch, trying to compare the cost of each conditional move, as calculated with rtx_costs, against COSTS_N_INSNS (branch_cost). This essentially kills the optimisation for any target with conditional-move cost > 1. Personally, I consider that a pretty horrible bug in this patch - but I couldn't think of anything better to try. As you might expect, this triggers all over the place when TARGET_BRANCH_COST numbers are tuned high. In an AArch64 Spec2006 build, I saw 3.9% more CSEL operations with this patch and TARGET_BRANCH_COST set to 4. Performance is also good on AArch64 on a range of microbenchmarks and larger workloads (after playing with the branch costs). I didn't see any performance regression on x86_64, as you would expect given that the cost models preclude x86_64 targets from ever hitting this optimisation. Bootstrapped and tested on x86_64 and AArch64 with no issues, and bootstrapped and tested with the cost model turned off, to have some confidence that we will continue to do the right thing if any targets do up their branch costs and start using this code. No testcase provided, as currently I don't know of targets with a high enough branch cost to actually trigger the optimisation. OK? Thanks, James --- gcc/ 2015-09-07 James Greenhalgh * ifcvt.c (bb_ok_for_noce_convert_multiple_sets): New. (noce_convert_multiple_sets): Likewise. (noce_process_if_block): Call them. diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c index 157a716..059bd89 100644 --- a/gcc/ifcvt.c +++ b/gcc/ifcvt.c @@ -2982,6 +2982,223 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond, return false; } +/* We have something like: + + if (x > y) + { i = a; j = b; k = c; } + + Make it: + + tmp_i = (x > y) ? a : i; + tmp_j = (x > y) ? b : j; + tmp_k = (x > y) ? c : k; + i = tmp_i; <- Should be cleaned up + j = tmp_j; <- Likewise. + k = tmp_k; <- Likewise. + + Look for special cases such as use of temporary registers (for + example in a swap idiom). + + IF_INFO contains the useful information about the block structure and + jump instructions. */ + +static int +noce_convert_multiple_sets (struct noce_if_info *if_info) +{ + basic_block test_bb = if_info->test_bb; + basic_block then_bb = if_info->then_bb; + basic_block join_bb = if_info->join_bb; + rtx_insn *jump = if_info->jump; + rtx_insn *cond_earliest; + unsigned int cost = 0; + rtx_insn *insn; + + start_sequence (); + + /* Decompose the condition attached to the jump. */ + rtx cond = noce_get_condition (jump, &cond_earliest, false); + rtx x = XEXP (cond, 0); + rtx y = XEXP (cond, 1); + rtx_code cond_code = GET_CODE (cond); + + /* The true targets for a conditional move. */ + vec targets = vNULL; + /* The temporaries introduced to allow us to not consider register + overlap. */ +