On 08/15/2016 12:09 PM, Ali Çehreli wrote:
dmd does not allow anything larger.

Could you please help me understand the following results, possibly by analyzing the produced assembly?

I wanted to see whether there were any performance penalties when one used D's recommendation of using dynamic arrays beyond 16MiB.

Here is the test code:

enum size = 15 * 1024 * 1024;

version (STATIC) {
    ubyte[size] arr;
}
else {
    ubyte[] arr;

    static this() {
        arr = new ubyte[](size);
    }
}

void main() {
    auto p = arr.ptr;

    foreach (j; 0 .. 100) {
        foreach (i; 0..arr.length) {
            version (POINTER) {
                p[i] += cast(ubyte)i;
            }
            else {
                arr[i] += cast(ubyte)i;
            }
        }
    }
}

My CPU is an i7 with 4M cache:

$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    2
Core(s) per socket:    2
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 78
Model name:            Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
Stepping:              3
CPU MHz:               513.953
CPU max MHz:           3400.0000
CPU min MHz:           400.0000
BogoMIPS:              5615.89
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              4096K

I tried two compilers:

- DMD64 D Compiler v2.071.2-b2

- LDC - the LLVM D compiler (1.0.0):
   based on DMD v2.070.2 and LLVM 3.8.0

As seen in the code, I tried two version identifiers:

-  STATIC: Use static array
-    else: Use dynamic array

- POINTER: Access array elements through .ptr
-    else: Access array elements through the [] operator

So, that gave me 8 combinations. Below, I list both the compilation command lines that I used and the wallclock times that each program execution took (as reported by the 'time' utility).

1) dmd deneme.d -ofdeneme -O -boundscheck=off -inline -version=STATIC -version=POINTER

   4.332s


2) dmd deneme.d -ofdeneme -O -boundscheck=off -inline -version=STATIC

   4.238s


3) dmd deneme.d -ofdeneme -O -boundscheck=off -inline -version=POINTER

   4.321s


4) dmd deneme.d -ofdeneme -O -boundscheck=off -inline

   3.845s  <== BEST for dmd


5) ldc2 deneme.d -ofdeneme -O5 -release -boundscheck=off -d-version=POINTER -d-version=STATIC

   0.469s


6) ldc2 deneme.d -ofdeneme  -O5 -release -boundscheck=off -d-version=STATIC

  0.472s


7) ldc2 deneme.d -ofdeneme  -O5 -release -boundscheck=off -d-version=POINTER

  0.182s  <== BEST for ldc2


8) ldc2 deneme.d -ofdeneme  -O5 -release -boundscheck=off

  0.792s


So, for dmd, going with the recommendation of using a dynamic array is faster. Interestingly, using .ptr is actually slower. How?

With ldc2, the best option is to go with a dynamic array ONLY IF you access the elements through the .ptr property. As seen in the last result, using the [] operator on the array is about 4 times slower than that.

Does that make sense to you? Why would that be?

Thank you,
Ali

Reply via email to