On 19/06/17 15:08, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Jun 19, 2017 at 02:46:59PM +0100, Richard Earnshaw (lists) wrote:
>> Many parallel set insns are of the form of a single set that also sets
>> the condition code flags.  In this case the cost of such an insn is
>> normally the cost of the part that doesn't set the flags, since updating
>> the condition flags is simply a side effect.
>>
>> At present all such insns are treated as having unknown cost (ie 0) and
>> combine assumes that such insns are infinitely more expensive than any
>> other insn sequence with a non-zero cost.
> 
> That's not what combine does: it optimistically assumes any combination
> with unknown costs is an improvement.

So try this testcase on ARM.

unsigned long x, y, z;
int b;
void test()
{
   b = __builtin_sub_overflow (y,z, &x);
}


Without the patch, combine rips apart a compare and subtract insn
because it sees it as having cost zero and substitutes it with separate
compare and subtract insns.

ie before:


        ldr     r3, .L5
        ldr     r2, .L5+4
        ldr     r3, [r3]
        ldr     r2, [r2]
        cmp     r3, r2    <=====
        movcs   r0, #0
        movcc   r0, #1
        ldr     ip, .L5+8
        ldr     r1, .L5+12
        sub     r3, r3, r2  <=====
        str     r3, [ip]
        str     r0, [r1]
        bx      lr

after:

        ldr     r3, .L5
        ldr     r2, .L5+4
        ldr     r3, [r3]
        ldr     r2, [r2]
        subs    r3, r3, r2  <====
        movcc   r1, #1
        movcs   r1, #0
        ldr     r0, .L5+8
        ldr     r2, .L5+12
        str     r3, [r0]
        str     r1, [r2]
        bx      lr

The combine log before the patch shows:

allowing combination of insns 10 and 51
original costs 0 + 8 = 0
replacement costs 4 + 12 = 16

So it is clearly deciding that the original costs are greater than the
replacement costs.

> 
>> This patch addresses this problem by allowing insn_rtx_cost to ignore
>> the condition setting part of a PARALLEL iff there is exactly one
>> comparison set and one non-comparison set.  If the only set operation is
>> a comparison we still use that as the basis of the insn cost.
> 
> I'll test this on a zillion archs, see what the effect is.
> 
> Have you considered costing general parallels as well?
> 
> 

I thought about it but concluded that there's no generically correct
answer.  It might be the max of all the individual sets or it might be
the sum, or it might be somewhere in between.  For example on ARM the
load/store multiple operations are expressed as parallels, but their
cost will depend on how many loads/stores happen in parallel within the
hardware.

I think we'd need a new back-end hook to handle the other cases sensibly.

R.

> Segher
> 

Reply via email to