On Sun, May 7, 2017 at 11:22 AM, Fredrik Hederstierna
<fredrik.hederstie...@verisure.com> wrote:
> Hi,
>
> I have a question about loop induction variables, related to
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67213
>
> Consider a simple loop like
>
>   int ix;
>   for (ix = 0; ix < 6; ix++) {
>     data[ix] = ix;
>   }
>
> In this case variable 'ix' is used as counting variable for array index,
> but also used as value for storage, so in some sense its used in two 
> different "ways".
>
> Several target architectures might have support to auto-increment pointer 
> registers when storing or reading data,
> but if same variable is used as both array index to pointer and as data it 
> might be more complicated?
>
> In the example above, it seems like GCC loop analyzer (for completely 
> peeling) thinks 'ix' is an induction variable that can be folded away:
> from 'tree-ssa-loop-ivcanon.c':
>
>   Loop 1 iterates 5 times.
>   Loop 1 iterates at most 5 times.
>   Estimating sizes for loop 1
>    BB: 3, after_exit: 0
>     size:   0 _4 = (char) i_9;
>      Induction variable computation will be folded away.
>     size:   1 data[i_9] = _4;
>     size:   1 i_6 = i_9 + 1;   <----- OK?
>      Induction variable computation will be folded away.
>     size:   1 ivtmp_7 = ivtmp_1 - 1;
>      Induction variable computation will be folded away.
>     size:   2 if (ivtmp_7 != 0)
>      Exit condition will be eliminated in peeled copies.
>    BB: 4, after_exit: 1
>   size: 5-4, last_iteration: 5-2
>     Loop size: 5
>     Estimated size after unrolling: 5
>
> Then index 'ix' is considered to be a induction variable, but possibly it 
> cannot be simply folded since its used in other ways as well?

Well, but all the listed stmts _can_ be folded away because the
starting value is zero and thus they "fold away" to constants.

> Though completely-peeling loop resulted in longer code:
>
>
>   int i;
>   char _4;
>   unsigned int ivtmp_7;
>   char _12;
>   unsigned int ivtmp_15;
>   char _19;
>   unsigned int ivtmp_22;
>   char _26;
>   unsigned int ivtmp_29;
>   char _33;
>   unsigned int ivtmp_36;
>   char _40;
>   unsigned int ivtmp_43;
>
>   <bb 2>:
>   _12 = 0;
>   data[0] = _12;
>   i_14 = 1;
>   ivtmp_15 = 5;
>   _19 = (char) i_14;
>   data[i_14] = _19;
>   i_21 = i_14 + 1;
>   ivtmp_22 = ivtmp_15 + 4294967295;
>   _26 = (char) i_21;
>   data[i_21] = _26;
>   i_28 = i_21 + 1;
>   ivtmp_29 = ivtmp_22 + 4294967295;
>   _33 = (char) i_28;
>   data[i_28] = _33;
>   i_35 = i_28 + 1;
>   ivtmp_36 = ivtmp_29 + 4294967295;
>   _40 = (char) i_35;
>   data[i_35] = _40;
>   i_42 = i_35 + 1;
>   ivtmp_43 = ivtmp_36 + 4294967295;
>   _4 = (char) i_42;
>   data[i_42] = _4;
>   i_6 = i_42 + 1;
>   ivtmp_7 = ivtmp_43 + 4294967295;
>   return;
>
>
> instead of original and shorter
>
>   int i;
>   unsigned int ivtmp_1;
>   char _4;
>   unsigned int ivtmp_7;
>
>   <bb 2>:
>
>   <bb 3>:
>   # i_9 = PHI <i_6(4), 0(2)>
>   # ivtmp_1 = PHI <ivtmp_7(4), 6(2)>
>   _4 = (char) i_9;
>   data[i_9] = _4;
>   i_6 = i_9 + 1;
>   ivtmp_7 = ivtmp_1 - 1;
>   if (ivtmp_7 != 0)
>     goto <bb 4>;
>   else
>     goto <bb 5>;
>
>   <bb 4>:
>   goto <bb 3>;
>
>   <bb 5>:
>   return;

The cost metrics assume constant propagation happened which in the
above code didn't yet:

>   i_14 = 1;
>   _19 = (char) i_14;
...

So you are comparing (visually) apples and oranges.

>
> Example for ARM target machine code it became longer:
>
> 0000001c <test_iter_6>:
>   1c:   e59f3030        ldr     r3, [pc, #48]   ; 54 <test_iter_6+0x38>
>   20:   e3a02000        mov     r2, #0
>   24:   e5c32000        strb    r2, [r3]
>   28:   e3a02001        mov     r2, #1
>   2c:   e5c32001        strb    r2, [r3, #1]
>   30:   e3a02002        mov     r2, #2
>   34:   e5c32002        strb    r2, [r3, #2]
>   38:   e3a02003        mov     r2, #3
>   3c:   e5c32003        strb    r2, [r3, #3]
>   40:   e3a02004        mov     r2, #4
>   44:   e5c32004        strb    r2, [r3, #4]
>   48:   e3a02005        mov     r2, #5
>   4c:   e5c32005        strb    r2, [r3, #5]
>   50:   e12fff1e        bx      lr
>   54:   00000000        .word   0x00000000
>
> compared to if complete-loop-peeling was not done
>
> 0000001c <test_iter_6>:
>   1c:   e59f2014        ldr     r2, [pc, #20]   ; 38 <test_iter_6+0x1c>
>   20:   e3a03000        mov     r3, #0
>   24:   e7c33002        strb    r3, [r3, r2]
>   28:   e2833001        add     r3, r3, #1
>   2c:   e3530006        cmp     r3, #6
>   30:   1afffffb        bne     24 <test_iter_6+0x8>
>   34:   e12fff1e        bx      lr
>   38:   00000000        .word   0x00000000
>
> Producing 15 instead of 8 words, giving ~100% larger code size.
>
> And same for x86 arch:
>
>    f:   c6 05 00 00 00 00 00    movb   $0x0,0x0(%rip)        # 16
> <test_iter_6+0x7>
>   16:   c6 05 00 00 00 00 01    movb   $0x1,0x0(%rip)        # 1d
> <test_iter_6+0xe>
>   1d:   c6 05 00 00 00 00 02    movb   $0x2,0x0(%rip)        # 24
> <test_iter_6+0x15>
>   24:   c6 05 00 00 00 00 03    movb   $0x3,0x0(%rip)        # 2b
> <test_iter_6+0x1c>
>   2b:   c6 05 00 00 00 00 04    movb   $0x4,0x0(%rip)        # 32
> <test_iter_6+0x23>
>   32:   c6 05 00 00 00 00 05    movb   $0x5,0x0(%rip)        # 39
> <test_iter_6+0x2a>
>   39:   c3                      retq
>
>
> Without complete-loop-peeling done:
>
>    f:   31 c0                   xor    %eax,%eax
>   11:   88 80 00 00 00 00       mov    %al,0x0(%rax)
>   17:   48 ff c0                inc    %rax
>   1a:   48 83 f8 06             cmp    $0x6,%rax
>   1e:   75 f1                   jne    11 <test_iter_6+0x2>
>   20:   c3                      retq
>
>
> Producing 43 instead of 18 bytes, ~140% bigger code size.
>
> Should a variable be considered to be induction variable also when its used 
> both as LHS array index and RHS data value in same loop?
> Since its cross-target for both ARM and x86, does it origin in some cost 
> estimation on how 'ix' or other induction variables will be folded or not in 
> the final target code?

Yes, the "heuristic" is off in some cases (it adds some optimistic
factor of followup optimization opportunities IIRC).  But not for
the reason you think.

Richard.

> Thanks, Kind Regards,
> Fredrik
>

Reply via email to