Hi Bingfeng,

> Hello,
> I tried to use doloop_end pattern to reduce loop overhead for our target
> processor, which features a dedicated loop instruction.  Somehow even a
> simple loop just cannot pass the test of doloop_condition_get, which
> requires following canonical pattern.


I checked this on our private port of GCC .  This is based off 4.3
branch which is off what we are working on right now .  We do use the
doloop pattern to generate out these cases in our port and I can
confirm that for our case we generate the following bit of code. Our
tree does have a few other tweaks that we maintain that we'd like to
contribute once the copyright assignments are in place.

Unroll:
       c2c     $c5,$c2
       i2cs    $c4,63
.L2:
       ldw     $c2,($c5)+=1
       add     $c2,$c1,$c2
       stw     ($c3)+=1,$c2
       brinzdec        $c4,.L2
       brz     $zero,$link

You probably want to see the mt backend for some example as to how to
do it . It looks similar to how we do it in ours.


cheers
Ramana

----
Ramana Radhakrishnan
Icera Semiconductor

On Wed, Jul 16, 2008 at 12:05 PM, Bingfeng Mei <[EMAIL PROTECTED]> wrote:
> Hello,
> I tried to use doloop_end pattern to reduce loop overhead for our target
> processor, which features a dedicated loop instruction.  Somehow even a
> simple loop just cannot pass the test of doloop_condition_get, which
> requires following canonical pattern.
>
>  /* The canonical doloop pattern we expect has one of the following
>     forms:
>
>     1)  (parallel [(set (pc) (if_then_else (condition)
>                                            (label_ref (label))
>                                            (pc)))
>                     (set (reg) (plus (reg) (const_int -1)))
>                     (additional clobbers and uses)])
>
>     The branch must be the first entry of the parallel (also required
>     by jump.c), and the second entry of the parallel must be a set of
>     the loop counter register.  Some targets (IA-64) wrap the set of
>     the loop counter in an if_then_else too.
>
>     2)  (set (reg) (plus (reg) (const_int -1))
>         (set (pc) (if_then_else (reg != 0)
>                                 (label_ref (label))
>                                 (pc))).  */
>
>
> Here is a simple function I used, it should meet all doloop optimization
> requirements.
> void Unroll( short s, int * restrict b_inout, int *restrict out, int N)
> {
>        int i;
>        for (i=0; i<64; i++)
>        {
>                out[i] = b_inout[i] +  s;
>        }
> }
>
>
> In tree ivcanon pass, it is converted to
> ;; Function Unroll (Unroll)
>
> Unroll (short int s, int * restrict b_inout, int * restrict out, int N)
> {
>  unsigned int ivtmp.14;
>  int pretmp.9;
>  long unsigned int pretmp.8;
>  int storetmp.6;
>  int i;
>  int D.1459;
>  int D.1458;
>  int D.1457;
>  int * D.1456;
>  int * D.1455;
>  long unsigned int D.1454;
>  long unsigned int D.1453;
>
> <bb 2>:
>  pretmp.9_8 = (int) s_12(D);
>
> <bb 3>:
>  # ivtmp.14_13 = PHI <ivtmp.14_21(4), 64(2)>
>  # i_19 = PHI <i_15(4), 0(2)>
>  D.1453_3 = (long unsigned int) i_19;
>  D.1454_4 = D.1453_3 * 4;
>  D.1455_6 = out_5(D) + D.1454_4;
>  D.1456_10 = b_inout_9(D) + D.1454_4;
>  D.1457_11 = *D.1456_10;
>  D.1459_14 = pretmp.9_8 + D.1457_11;
>  *D.1455_6 = D.1459_14;
>  i_15 = i_19 + 1;
>  ivtmp.14_21 = ivtmp.14_13 - 1;
>  if (ivtmp.14_21 != 0)
>    goto <bb 4>;
>  else
>    goto <bb 5>;
>
> <bb 4>:
>  goto <bb 3>;
>
> <bb 5>:
>  return;
>
> }
>
>
> This should match requirements of doloop_condition_get.  But after
> ivopts pass, the code is transformed to:
>
> ;; Function Unroll (Unroll)
>
> Unroll (short int s, int * restrict b_inout, int * restrict out, int N)
> {
>  long unsigned int ivtmp.21;
>  unsigned int ivtmp.14;
>  int pretmp.9;
>  long unsigned int pretmp.8;
>  int storetmp.6;
>  int i;
>  int D.1459;
>  int D.1458;
>  int D.1457;
>  int * D.1456;
>  int * D.1455;
>  long unsigned int D.1454;
>  long unsigned int D.1453;
>
> <bb 2>:
>  pretmp.9_8 = (int) s_12(D);
>
> <bb 3>:
>  # ivtmp.21_7 = PHI <ivtmp.21_16(4), 0(2)>
>  D.1457_11 = MEM[base: b_inout_9(D), index: ivtmp.21_7];
>  D.1459_14 = pretmp.9_8 + D.1457_11;
>  MEM[base: out_5(D), index: ivtmp.21_7] = D.1459_14;
>  ivtmp.21_16 = ivtmp.21_7 + 4;
>  if (ivtmp.21_16 != 256)
>    goto <bb 4>;
>  else
>    goto <bb 5>;
>
> <bb 4>:
>  goto <bb 3>;
>
> <bb 5>:
>  return;
>
> }
>
>
> It is not required canonical form anymore. And later RTL level
> optimizations cannot convert it back. Since it doesn't pass the
> doloop_condition_get test, modulo scheduling pass doesn't work too.  Do
> I miss something here?  Any hint is greatly appreciated.
>
> Cheers,
> Bingfeng Mei
>
>
>



-- 
Ramana Radhakrishnan

Reply via email to