Re: [Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs.

Richard Biener Wed, 30 Sep 2015 01:05:37 -0700

On Tue, Sep 29, 2015 at 4:31 PM, James Greenhalgh
<james.greenha...@arm.com> wrote:
> On Tue, Sep 29, 2015 at 11:16:37AM +0100, Richard Biener wrote:
>> On Fri, Sep 25, 2015 at 5:04 PM, James Greenhalgh
>> <james.greenha...@arm.com> wrote:
>> > Hi,
>> >
>> > In relation to the patch I put up for review a few weeks ago to teach
>> > RTL if-convert to handle multiple sets in a basic block [1], I was
>> > asking about a sensible cost model to use. There was some consensus at
>> > Cauldron that what should be done in this situation is to introduce a
>> > target hook that delegates answering the question to the target.
>>
>> Err - the consensus was to _not_ add gazillion of special target hooks
>> but instead enhance what we have with rtx_cost so that passes can
>> rely on comparing before and after costs of a sequence of insns.
>
> Ah, I was not able to attend Cauldron this year, so I was trying to pick out
> "consensus" from the video. Rewatching it now, I see a better phrase would
> be "suggestion with some support".
>
> Watching the video a second time, it seems your proposal is that we improve
> the RTX costs infrastructure to handle sequences of Gimple/RTX. That would
> get us some way to making a smart decision in if-convert, but I'm not
> convinced it allows us to answer the question we are interested in.
>
> We have the rtx for before and after, and we can generate costs for these
> sequences. This allows us to calculate some weighted cost of the
> instructions based on the calculated probabilities that each block is
> executed. However, we are missing information on how expensive the branch
> is, and we have no way to get that through an RTX-costs infrastructure.


Yeah, though during the meeting at the Cauldron I was asking on whether we
maybe want a replacement_cost hook that can assume the to-be-replaced
sequence is in the IL thus the hook can inspect insn_bb and thus get at
branch costs ...

Surely the proposed rtx_cost infrastructure enhancements will not
cover all cases
so the thing I wanted to throw in was that there was _not_ consensus that cost
should be computed by pass specific target hooks that allow the target
to inspect
pass private data.  Because that's a maintainance nightmare if you change a pass
and have to second guess 50 targets cost hook implementations.

> We could add a hook to give a cost in COSTS_N_INSNS units to a branch based
> on its predictability. This is difficult as COSTS_N_INSNS units can differ
> depending on whether you are talking about floating-point or integer code.

Yes, which is why I suggested a replacement cost ...  Of course the question is
what you feed that hook as in principle it would be nice to avoid building the
replacement RTXen until we know doing that will be necessary (the replacement
is profitable).

Maybe as soon as CFG changes are involved we need to think about adding
a BB frequency to the hook though factoring in that can be done in the passes.
What is the interesting part for the target is probably the cost of
the branch itself
as that can vary depending on predictability (which is usually very
hard to assess
at compile-time anyway).  So what about a branch_cost hook that takes
taken/not-taken probabilities as argument?

Note that the agreement during the discussion was that all costs need to be
comparable.

> By this I mean, the compiler considers a SET which costs more than
> COSTS_N_INSNS (1) to be "expensive". Consequently, some targets set the cost
> of both an integer SET and a floating-point SET to both be COSTS_N_INSNS (1).
> In reality, these instructions may have different latency performance
> characteristics. What real world quantity are we trying to invoke when we
> say a branch costs the same as 3 SET instructions of any type? It certainly
> isn't mispredict penalty (likely measured in cycles, not relative to the cost
> of a SET instruction, which may well be completely free on modern x86
> processors), nor is it the cost of executing the branch instruction which
> is often constant to resolve regardless of predicted/mispredicted status.
>
> On the other side of the equation, we want a cost for the converted
> sequence. We can build a cost of the generated rtl sequence, but for
> targets like AArch64 this is going to be wildly off. AArch64 will expand
> (a > b) ? x : y; as a set to the CC register, followed by a conditional
> move based on the CC register. Consequently, where we have multiple sets
> back to back we end up with:
>
>   set CC (a > b)
>   set x1 (CC ? x : y)
>   set CC (a > b)
>   set x2 (CC ? x : z)
>   set CC (a > b)
>   set x3 (CC ? x : k)
>
> Which we know will be simplified later to:
>
>   set CC (a > b)
>   set x1 (CC ? x : y)
>   set x2 (CC ? x : z)
>   set x3 (CC ? x : k)
>
> I imagine other targets have something similar in their expansion of
> mov<mode>cc (though I haven't looked).
>
> Our comparison for if-conversion then must be:
>
>   weighted_old_cost = (taken_probability * (then_bb_cost)
>                         - (1 - taken_probability) * (else_bb_cost));
>   branch_cost = branch_cost_in_insns (taken_probability)
>   weighted_new_cost = redundancy_factor (new_sequence) * seq_cost 
> (new_sequence)
>
>   profitable = weighted_new_cost <= weighted_old_cost + branch_cost
>
> And we must define:
>
>   branch_cost_in_insns (taken_probability)
>   redundancy_factor (new_sequence)
>
> At that point, I feel you are better giving the entire sequence to the
> target and asking it to implement whatever logic is needed to return a
> profitable/unprofitable analysis of the transformation.

Sure, that was what was suggested at the Cauldron - rtx_cost on individual
insns (rtx cost doesn't even work on that level usually!) isn't coarse enough.
We're C++ now so we'd pass the hook an iterator which it can use to
iterate over the insns it should compute the cost of.

> The "redundancy_factor" in particular is pretty tough to define in a way
> which makes sense outside of if_convert, without adding some pretty
> detailed analysis to decide what might or might not be eliminated by
> later passes. The alternative is to weight the other side of the equation
> by tuning the cost of branch_cost_in_insns high. This only serves to increase
> the disconnect between a real-world cost and a number to tweak to game
> code generation.
>
> If you have a different way of phrasing the if-conversion question that
> avoids the two very specific hooks, I'd be happy to try taking the patches
> in that direction. I don't see a way to implement this as just queries to
> a costing function which does not need substantial target and pass
> dependent tweaking to make behave correctly.

>From the patch I can't even see what the question is ;)  I only see
"is the transform profitable".

I saw from your hook implementation that it's basically a set of magic
numbers.  So I suggested to make those numbers target configurable
instead.

IMHO we shouldn't accept new cost hooks that are only implemented
by a single target.  Instead it should be proved that the hook can
improve code on multiple targets.

Richard.

> Thanks,
> James
>
>> > This patch series introduces that new target hook to provide cost
>> > decisions for the RTL ifcvt pass.
>> >
>> > The idea is to give the target full visibility of the proposed
>> > transformation, and allow it to respond as to whether if-conversion in that
>> > way is profitable.
>> >
>> > In order to preserve current behaviour across targets, we will need the
>> > default implementation to keep to the strategy of simply comparing branch
>> > cost against a magic number. Patch 1/3 performs this refactoring, which is
>> > a bit hairy in some corner cases.
>> >
>> > Patch 2/3 is a simple code move, pulling the definition of the if_info
>> > structure used by RTL if-convert in to ifcvt.h where it can be included
>> > by targets.
>> >
>> > Patch 3/3 then introduces the new target hook, with the same default
>> > behaviour as was previously in noce_is_profitable_p.
>> >
>> > The series has been bootstrapped on ARM, AArch64 and x86_64 targets, and
>> > I've verified with Spec2000 and Spec2006 runs that there are no code
>> > generation differences for any of these three targets after the patch.
>> >
>> > I also gave ultrasparc3 a quick go, from what I could see, I changed the
>> > register allocation for the floating-point condition code registers.
>> > Presumably this is a side effect of first constructing RTXen that I then
>> > discard. I didn't see anything which looked like more frequent reloads or
>> > substantial code generation changes, though I'm not familiar with the
>> > intricacies of the Sparc condition registers :).
>> >
>> > I've included a patch 4/3, to give an example of what a target might want
>> > to do with this hook. It needs work for tuning and deciding how the 
>> > function
>> > should actually behave, but works if it is thought of as more of a
>> > strawman/prototype than a patch submission.
>> >
>> > Are parts 1, 2 and 3 OK?
>> >
>> > Thanks,
>> > James
>> >
>> > [1]: https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00781.html
>> >
>> > ---
>> > [Patch ifcvt 1/3] Factor out cost calculations from noce cases
>> >
>> > 2015-09-26  James Greenhalgh  <james.greenha...@arm.com>
>> >
>> >         * ifcvt.c (noce_if_info): Add a magic_number field :-(.
>> >         (noce_is_profitable_p): New.
>> >         (noce_try_store_flag_constants): Move cost calculation
>> >         to after sequence generation, factor it out to 
>> > noce_is_profitable_p.
>> >         (noce_try_addcc): Likewise.
>> >         (noce_try_store_flag_mask): Likewise.
>> >         (noce_try_cmove): Likewise.
>> >         (noce_try_cmove_arith): Likewise.
>> >         (noce_try_sign_mask): Add comment regarding cost calculations.
>> >
>> > [Patch ifcvt 2/3] Move noce_if_info in to ifcvt.h
>> >
>> > 2015-09-26  James Greenhalgh  <james.greenha...@arm.com>
>> >
>> >         * ifcvt.c (noce_if_info): Move to...
>> >         * ifcvt.h (noce_if_info): ...Here.
>> >
>> > [Patch ifcvt 3/3] Create a new target hook for deciding profitability
>> >     of noce if-conversion
>> >
>> > 2015-09-26  James Greenhalgh  <james.greenha...@arm.com>
>> >
>> >         * target.def (costs): New hook vector.
>> >         (ifcvt_noce_profitable_p): New hook.
>> >         * doc/tm.texi.in: Document it.
>> >         * doc/tm.texi: Regenerate.
>> >         * targhooks.h (default_ifcvt_noce_profitable_p): New.
>> >         * targhooks.c (default_ifcvt_noce_profitable_p): New.
>> >         * ifcvt.c (noce_profitable_p): Use new target hook.
>> >
>> > [Patch Prototype AArch64 ifcvt 4/3] Wire up the new if-convert costs
>> >     hook for AArch64
>> >
>> > 2015-09-26  James Greenhalgh  <james.greenha...@arm.com>
>> >
>> >         * config/aarch64/aarch64.c
>> >         (aarch64_additional_branch_cost_for_probability): New.
>> >         (aarch64_ifcvt_noce_profitable_p): Likewise.
>> >         (TARGET_COSTS_IFCVT_NOCE_PROFITABLE_P): Likewise.
>>

Re: [Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs.

Reply via email to