Hi Bingfeng, > Hello, > I tried to use doloop_end pattern to reduce loop overhead for our target > processor, which features a dedicated loop instruction. Somehow even a > simple loop just cannot pass the test of doloop_condition_get, which > requires following canonical pattern.
I checked this on our private port of GCC . This is based off 4.3 branch which is off what we are working on right now . We do use the doloop pattern to generate out these cases in our port and I can confirm that for our case we generate the following bit of code. Our tree does have a few other tweaks that we maintain that we'd like to contribute once the copyright assignments are in place. Unroll: c2c $c5,$c2 i2cs $c4,63 .L2: ldw $c2,($c5)+=1 add $c2,$c1,$c2 stw ($c3)+=1,$c2 brinzdec $c4,.L2 brz $zero,$link You probably want to see the mt backend for some example as to how to do it . It looks similar to how we do it in ours. cheers Ramana ---- Ramana Radhakrishnan Icera Semiconductor On Wed, Jul 16, 2008 at 12:05 PM, Bingfeng Mei <[EMAIL PROTECTED]> wrote: > Hello, > I tried to use doloop_end pattern to reduce loop overhead for our target > processor, which features a dedicated loop instruction. Somehow even a > simple loop just cannot pass the test of doloop_condition_get, which > requires following canonical pattern. > > /* The canonical doloop pattern we expect has one of the following > forms: > > 1) (parallel [(set (pc) (if_then_else (condition) > (label_ref (label)) > (pc))) > (set (reg) (plus (reg) (const_int -1))) > (additional clobbers and uses)]) > > The branch must be the first entry of the parallel (also required > by jump.c), and the second entry of the parallel must be a set of > the loop counter register. Some targets (IA-64) wrap the set of > the loop counter in an if_then_else too. > > 2) (set (reg) (plus (reg) (const_int -1)) > (set (pc) (if_then_else (reg != 0) > (label_ref (label)) > (pc))). */ > > > Here is a simple function I used, it should meet all doloop optimization > requirements. > void Unroll( short s, int * restrict b_inout, int *restrict out, int N) > { > int i; > for (i=0; i<64; i++) > { > out[i] = b_inout[i] + s; > } > } > > > In tree ivcanon pass, it is converted to > ;; Function Unroll (Unroll) > > Unroll (short int s, int * restrict b_inout, int * restrict out, int N) > { > unsigned int ivtmp.14; > int pretmp.9; > long unsigned int pretmp.8; > int storetmp.6; > int i; > int D.1459; > int D.1458; > int D.1457; > int * D.1456; > int * D.1455; > long unsigned int D.1454; > long unsigned int D.1453; > > <bb 2>: > pretmp.9_8 = (int) s_12(D); > > <bb 3>: > # ivtmp.14_13 = PHI <ivtmp.14_21(4), 64(2)> > # i_19 = PHI <i_15(4), 0(2)> > D.1453_3 = (long unsigned int) i_19; > D.1454_4 = D.1453_3 * 4; > D.1455_6 = out_5(D) + D.1454_4; > D.1456_10 = b_inout_9(D) + D.1454_4; > D.1457_11 = *D.1456_10; > D.1459_14 = pretmp.9_8 + D.1457_11; > *D.1455_6 = D.1459_14; > i_15 = i_19 + 1; > ivtmp.14_21 = ivtmp.14_13 - 1; > if (ivtmp.14_21 != 0) > goto <bb 4>; > else > goto <bb 5>; > > <bb 4>: > goto <bb 3>; > > <bb 5>: > return; > > } > > > This should match requirements of doloop_condition_get. But after > ivopts pass, the code is transformed to: > > ;; Function Unroll (Unroll) > > Unroll (short int s, int * restrict b_inout, int * restrict out, int N) > { > long unsigned int ivtmp.21; > unsigned int ivtmp.14; > int pretmp.9; > long unsigned int pretmp.8; > int storetmp.6; > int i; > int D.1459; > int D.1458; > int D.1457; > int * D.1456; > int * D.1455; > long unsigned int D.1454; > long unsigned int D.1453; > > <bb 2>: > pretmp.9_8 = (int) s_12(D); > > <bb 3>: > # ivtmp.21_7 = PHI <ivtmp.21_16(4), 0(2)> > D.1457_11 = MEM[base: b_inout_9(D), index: ivtmp.21_7]; > D.1459_14 = pretmp.9_8 + D.1457_11; > MEM[base: out_5(D), index: ivtmp.21_7] = D.1459_14; > ivtmp.21_16 = ivtmp.21_7 + 4; > if (ivtmp.21_16 != 256) > goto <bb 4>; > else > goto <bb 5>; > > <bb 4>: > goto <bb 3>; > > <bb 5>: > return; > > } > > > It is not required canonical form anymore. And later RTL level > optimizations cannot convert it back. Since it doesn't pass the > doloop_condition_get test, modulo scheduling pass doesn't work too. Do > I miss something here? Any hint is greatly appreciated. > > Cheers, > Bingfeng Mei > > > -- Ramana Radhakrishnan