Adam Ruppe:
> foreach(i; retro(iota(0, 10))) { }

Oh, right! Or even just:

foreach (i; retro(iota(10))) {}

But abstraction has a cost, see below. I have written three test programs.

----------------

// test1
import std.c.stdio: printf;
void main() {
    enum int N = 100_000_000;
    int count;
    for (int i; i < N; i++)
        count++;
    printf("%d\n", count);
}

Asm of the inner code of the test1, opt version:

5:              inc     ECX
                inc     EAX
                cmp     EAX,05F5E100h
                jb      L5

----------------

// test2
import std.c.stdio: printf;
void main() {
    enum int N = 100_000_000;
    int count;
    foreach_reverse (i; 0 .. N)
        count++;
    printf("%d\n", count);
}

Asm of the inner code of the test2, opt version:

LF:             dec     ECX
                inc     EDX
                mov     EAX,ECX
                inc     EAX
                jg      LF

----------------

// test3
import std.c.stdio: printf;
import std.range: iota, retro;
void main() {
    enum int N = 100_000_000;
    int count;
    foreach (i; retro(iota(N)))
        count++;
    printf("%d\n", count);
}

Asm of the inner code of the test3, opt version:

L6A:            inc     EBX
                lea     EAX,010h[ESP]
                call    near ptr 
_D3std5range145__T4TakeTS3std5range112__T8SequenceVAyaa27_612e6669656c645b305d202b206e202a20612eD0677AACB4B6BC826B92E7DBB9E1E359
                cmp     dword ptr 020h[ESP],0
                jne     L6A


_D3std5range145__T4TakeTS3std5range112__T8SequenceVAyaa27_612e6669656c645b305d202b206e202a20612eD0677AACB4B6BC826B92E7DBB9E1E359
        comdat
L0:             sub     ESP,01Ch
                mov     ECX,offset 
FLAT:_D3std5range145__T4TakeTS3std5range112__T8SequenceVAyaa27_612e6669656c645b305d202b206e202a20612e0B5C6D6E0C89B8D48DF56A414048DB6F
                push    EBX
                push    ESI
                mov     ESI,EAX
                mov     EDX,010h[ESI]
                mov     0Ch[ESP],EAX
                neg     EDX
                sbb     EDX,EDX
                neg     EDX
                xor     DL,1
                mov     010h[ESP],ECX
                je      L46
                mov     EDX,ECX
                push    dword ptr FLAT:_DATA[064h]
                push    dword ptr FLAT:_DATA[060h]
                push    051Fh
                mov     EAX,018h[ESP]
                mov     EBX,018h[ESP]
                call    EDX
                push    EDX
                push    EAX
                call    near ptr _D3std9contracts7bailOutFAyaixAaZv
L46:            dec     dword ptr 010h[ESI]
                pop     ESI
                pop     EBX
                add     ESP,01Ch
                ret


_D3std9contracts7bailOutFAyaixAaZv is extern.

------------------

Running time, best of 3, seconds:
  test1:     0.31
  test1 opt: 0.07
  test2:     0.31
  test2 opt: 0.12
  test3:     6.38
  test3 opt: 0.52

not opt version = dmd (no other option)
opt version = dmd -O -release -inline

Compile times opt version, seconds:
  test1: 0.05
  test2: 0.05
  test3: 0.28

So with the current dmd compiler in the inner loop in a performance-critical 
routine you can't use foreach(retro(iota(N))).

In future a D compiler can recognize and optimize such usage of foreach(iota()) 
and foreach(retro(iota())) as ShedSkin does with usage of xrange()/range() in a 
loop.

Bye,
bearophile

Reply via email to