Hi,

On Mon, 5 Mar 2007, Ian Lance Taylor wrote:

> > whereas with -fno-split-wide-types it generates this:
> > 
> >         move.l 16(%sp),%d0
> >         move.l 20(%sp),%d1
> >         move.l 8(%sp),%d2
> >         add.l 12(%sp),%d1
> >         addx.l %d2,%d0
> > 
> > How can I get rid of these extra move instructions?
> 
> The standard answer would be to add a define_split for the adddi3 insn
> which triggers before reload.  But that is problematic on a CC0 system
> where you want to preserve the overflow flag.  I'm not sure what to
> suggest at the moment.  Note that there is still an extra move.l insn
> in the -fno-split-wide-types version.

Actually it's correct, addx.l needs two registers. :)

> > Another more general question would be how should be wide registers 
> > handled in general. In the past I tried to avoid splitting instructions 
> > before reload, exactly because the extra subregs caused worse code. Has 
> > this changed? AFAICT this would mean in the back end to split DI values as 
> > early as possible, which could have its advantages, but also its 
> > challenges, as m68k is still a cc0 target and with instructions like 
> > addx.l above, so far I avoided splitting these at all.
> 
> Yes, it is in general better now to split double-word length
> operations before reload.  It's not necessarily better to split as
> early as possible, as that will essentially disable the RTL level loop
> optimizations. 

I was worried about that too, since some patterns would be more 
complicated, which may make some optimizations more difficult, e.g. where 
I noticed it first was mulsidi3, which currently looks like this:

  [(set (match_operand:SI 0 "register_operand" "=d")
        (mult:SI (match_operand:SI 1 "register_operand" "%0")
                  (match_operand:SI 2 "nonimmediate_operand" "dm")))
   (set (match_operand:SI 3 "register_operand" "=d")
        (truncate:SI (lshiftrt:DI (mult:DI (zero_extend:DI (match_dup 1))
                                           (zero_extend:DI (match_dup 2)))
                                  (const_int 32))))]

The dataflow branch doesn't like this due to the missing 
(strict_low_part), it's easy to fix by changing it to:

  [(set (match_operand:DI 0 "register_operand" "=d")
        (mult:DI (zero_extend:DI (match_operand:SI 1 "register_operand" "%0"))
                 (zero_extend:DI (match_operand:SI 2 "nonimmediate_operand" 
"dm"))))]

which is also easier to recognize as multiply for the rtl optimizers.
OTOH combine runs reasonably late, so it's quite easy to add a split to 
bring it back to the first form (plus strict_low_part).

> But it's still problematic to split before reload on a CC0 system.

I considered it splitting like this.

(parallel
  (set (op0) (plus (op1) (op2)))
  (set (cc0) (unspec [(op1) (op2)])))

(set (op0) (plus (plus (op1) (op2))
                 (unspec [(cc0)])))

This should pass as cc0 setter/getter and probably needs additional work 
in notice_update_cc0().

bye, Roman

Reply via email to