On 7 February 2012 19:39, Marco Leise <marco.le...@gmx.de> wrote:
> Hi, this is me again with some "size matters" topic. This time, it's not
> the executable size, no! Instead I want to discuss a runtime memory
> footprint and speed issue that affects everyone, and how to improve the
> situation dramatically.
> In D we allocate memory through the GC, that is initialized according to
> the type's .init, which gives us a save default. In most cases this will
> result in the memory block being zeroed out, like in the case of
> allocating ubyte[] buffers. Let's assume, we have a program that
> allocates some buffers in advance, that it may not use fully. This often
> happens
> when the input data is much smaller than the anticipated case. So our
> memory manager should handle this situation well:
>   o  zero out a memory block
>   o  we probably don't need all of it
> So here is a small benchmark that allocates 512 * 1 MB, first using the
> typical method: new ubyte[1024 * 1024]. The oputput is:
>        ** new ubyte[1024 * 1024]
>           ressource usage: +526840 KB
>           user time: +0.098s | sys. time: +0.368s
> As expected we have a physical memory usage increase of ~512 MB and spent
> a considerable amount of time in the system to find free memory blocks
> and in our program to initialize the data to zero. Can we use the GC more
> directly? Let's try GC.calloc:
>        ** GC.calloc(1024 * 1024)
>           ressource usage: +525104 KB
>           user time: +0.089s | sys. time: +0.370s
> Again, 512 MB and about the same time. Nothing gained, but my RAM is
> starting to fill up. By the way, how does a good old system call to
> 'malloc' compare? That gives us a block of garbage 'initialized' data - a
> situation we left behind for good in D! So here we go with another test:
>        ** malloc(1024 * 1024)
>           ressource usage: +2048 KB
>           user time: +0.000s | sys. time: +0.002s
> Oh nice! May I say... these 512 calls were for free? 2 MB and 0.002
> seconds ain't worth talking about. The operating system didn't actually
> allocate the memory, it merely gave us a virtual memory range to use.
> Only when we write to the memory will physical memory be bound. That's
> perfect
> for a generously sized buffer, right? Well... we still want it zeroed
> out, so let's initialize this data to zero with ptr[0 .. 1024 * 1024] = 0:
>        ** malloc(1024 * 1024) + zero out
>           ressource usage: +526336 KB
>           user time: +0.053s | sys. time: +0.366s
> ... and we are back at square one. With the exception, that the user time
> is notably lower. What we need is a facility that gives us lazily
> allocated zeroed out memory. And guess what, it's not too much to ask
> for. Here is 'calloc' to the rescue:
>        ** calloc(1, 1024 * 1024)
>           ressource usage: +2048 KB
>           user time: +0.001s | sys. time: +0.001s
> How does it work? The operating system fakes the memory allocation and
> just gives us 131072 references to a special read-only memory page of
> zeroes. The semantic is copy-on-write. So we start with a view on zeroed
> out memory and get the real thing once we write into it. (Sorry, if I
> tell
> some of you nothing new, but I just found this out today ;) )
> The question I have is, should we go and improve druntime with that
> knowledge? I'm not aware of any caveats, are there any?
> Thanks for reading and the test program for Linux is in the attachment (I
> used GDC to compile).
> -- Marco
What about these functions?

import std.array;

byte[][ALLOCS] a, b;

writeln("** uninitializedArray!(ubyte[])(1024*1024)");
foreach(i; 0 .. ALLOCS) b[i] = uninitializedArray!(ubyte[])(1024 * 1024);
prev = print_ressources(&prev);

writeln("** minimallyInitializedArray!(ubyte[])(1024*1024)");
foreach(i; 0 .. ALLOCS) c[i] = minimallyInitializedArray!(ubyte[])(1024 * 1024);
prev = print_ressources(&prev);


Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';

Reply via email to