Richard Henderson wrote:
> On 07/27/2011 06:21 AM, Georg-Johann Lay wrote:
>> +(define_insn_and_split "*mulsi3"
>> +  [(set (match_operand:SI 0 "pseudo_register_operand"                      
>> "=r")
>> +        (mult:SI (match_operand:SI 1 "pseudo_register_operand"              
>> "r")
>> +                 (match_operand:SI 2 "pseudo_register_or_const_int_operand" 
>> "rn")))
>> +   (clobber (reg:DI 18))]
>> +  "AVR_HAVE_MUL && !reload_completed"
>> +  { gcc_unreachable(); }
>> +  "&& 1"
>> +  [(set (reg:SI 18)
>> +        (match_dup 1))
> 
> That seems like it's guaranteed to force an unnecessary move.

It's the same as with the present implementation:


long mul (long a, long b)
{
    return a*b;
}

long mul2 (long a, long b)
{
    return b*a;
}

translates -Os -mmcu=atmega8 to

mul:
        rcall __mulsi3
        ret

mul2:
        push r12
        push r13
        push r14
        push r15
        movw r12,r22
        movw r14,r24
        movw r24,r20
        movw r22,r18
        movw r20,r14
        movw r18,r12
        rcall __mulsi3
        pop r15
        pop r14
        pop r13
        pop r12
        ret

> Have you tried defining special-purpose register classes to
> force reload to move the data into the right hard regs?
> 
> E.g.  "Y" prefix
>       "QHS" size
>       two digit starting register number, as needed.

I already thought about such register classes/constraints with
almost the same nomenclature, i.e. with prefix "R".

> You'll probably end up with quite a few register classes 
> out of this, but hopefully reload can do a better job than
> you can manually...

Agreed.

insns that will benefit are insns with two input operands that
commute, i.e. mulsi3, umulhisi3, mulhisi3, mulhi3.

Maybe even other 2-input insns could benefit because there's no
predetermined order in which the moves are accomplished; e.g.
moving R24 before R22 in udivmodqi4.  I don't know if register
allocator is smart enough to swap the assignments if that is
better.

Moreover, it would reduce the number of insns resp. split
patterns and help cleanup md.

I'd prefer to do that work in a separate patch.  The current patch
behaves the same as the old code, so it's not a performance
regression of the current patch.

Johann




Reply via email to