On Monday, 24 September 2018 at 14:31:45 UTC, Steven Schveighoffer wrote:
Why is the overhead so big for a single allocation of an array with elements containing no indirections (which the GC doesn't need to scan for pointers).

It's not scanning the blocks. But it is scanning the stack.

Ok, I modified the code to be

import std.stdio;

void* mallocAndFreeBytes(size_t byteCount)()
{
    import core.memory : pureMalloc, pureFree;
    void* ptr = pureMalloc(byteCount);
    pureFree(ptr);
    return ptr;                 // for side-effects
}

void main(string[] args)
{
    import std.datetime.stopwatch : benchmark;
    import core.time : Duration;

    immutable benchmarkCount = 1;

    // GC
    static foreach (const i; 0 .. 31)
    {
        {
            enum byteCount = 2^^i;
const Duration[1] resultsC = benchmark!(mallocAndFreeBytes!(i))(benchmarkCount);
            writef("%s bytes: mallocAndFreeBytes: %s nsecs",
byteCount, cast(double)resultsC[0].total!"nsecs"/benchmarkCount);

            import core.memory : GC;
            auto dArray = new byte[byteCount]; // one Gig
const Duration[1] resultsD = benchmark!(GC.collect)(benchmarkCount);
            writefln(" GC.collect(): %s nsecs after %s",
cast(double)resultsD[0].total!"nsecs"/benchmarkCount, dArray.ptr);
            dArray = null;
        }
    }
}

I still be believe these numbers are absolutely horrible

1 bytes: mallocAndFreeBytes: 400 nsecs GC.collect(): 21600 nsecs after 7F1ECC0B1000 2 bytes: mallocAndFreeBytes: 300 nsecs GC.collect(): 20800 nsecs after 7F1ECC0B1010 4 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 20500 nsecs after 7F1ECC0B1000 8 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 20300 nsecs after 7F1ECC0B1010 16 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 23200 nsecs after 7F1ECC0B2000 32 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 19600 nsecs after 7F1ECC0B1000 64 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 17800 nsecs after 7F1ECC0B2000 128 bytes: mallocAndFreeBytes: 300 nsecs GC.collect(): 16600 nsecs after 7F1ECC0B1000 256 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 16200 nsecs after 7F1ECC0B2000 512 bytes: mallocAndFreeBytes: 300 nsecs GC.collect(): 15900 nsecs after 7F1ECC0B1000 1024 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 15700 nsecs after 7F1ECC0B2000 2048 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 14600 nsecs after 7F1ECC0B1010 4096 bytes: mallocAndFreeBytes: 300 nsecs GC.collect(): 14400 nsecs after 7F1ECC0B2010 8192 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 14200 nsecs after 7F1ECC0B4010 16384 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 14100 nsecs after 7F1ECC0B7010 32768 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 14200 nsecs after 7F1ECC0BC010 65536 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 14200 nsecs after 7F1ECC0C5010 131072 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 14200 nsecs after 7F1ECC0D6010 262144 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 14200 nsecs after 7F1ECC0F7010 524288 bytes: mallocAndFreeBytes: 300 nsecs GC.collect(): 17500 nsecs after 7F1ECAC14010 1048576 bytes: mallocAndFreeBytes: 200 nsecs GC.collect(): 18000 nsecs after 7F1ECAC95010 2097152 bytes: mallocAndFreeBytes: 500 nsecs GC.collect(): 18700 nsecs after 7F1ECAD96010 4194304 bytes: mallocAndFreeBytes: 300 nsecs GC.collect(): 20000 nsecs after 7F1ECA514010 8388608 bytes: mallocAndFreeBytes: 400 nsecs GC.collect(): 61000 nsecs after 7F1EC9913010 16777216 bytes: mallocAndFreeBytes: 24900 nsecs GC.collect(): 27100 nsecs after 7F1EC8112010 33554432 bytes: mallocAndFreeBytes: 800 nsecs GC.collect(): 36600 nsecs after 7F1EC5111010 67108864 bytes: mallocAndFreeBytes: 600 nsecs GC.collect(): 57900 nsecs after 7F1EBF110010 134217728 bytes: mallocAndFreeBytes: 500 nsecs GC.collect(): 98300 nsecs after 7F1EB310F010 268435456 bytes: mallocAndFreeBytes: 700 nsecs GC.collect(): 175700 nsecs after 7F1E9B10E010 536870912 bytes: mallocAndFreeBytes: 600 nsecs GC.collect(): 326900 nsecs after 7F1E6B10D010 1073741824 bytes: mallocAndFreeBytes: 900 nsecs GC.collect(): 641500 nsecs after 7F1E0B04B010

How is it possible for the GC to be 500-1000 times slower than a malloc-free call for a single array containing just bytes with no indirections for such a simple function!!!?

I really don't understand this...

Reply via email to