On 6/7/17 3:56 AM, Biotronic wrote:
On Wednesday, 7 June 2017 at 05:43:06 UTC, ag0aep6g wrote:
[snip]

It seems to me this is a topic worthy of a more in-depth article. If
only I felt up to that. :p

Your understanding and explanation is excellent actually!


When you create a slice 'a' in D (with the current GC and druntime, at
least), what happens behind the scenes is the allocator chops off a
block of N bytes, where N is some number larger than
a.length*typeof(a[0]).sizeof. For an array of two ints, N is 16.
For good measure, let's make a copy 'b' of that slice (it will come in
handy later):

int[] a = [1, 2];
int[] b = a;

import std.stdio;
writeln(a.capacity);
3
writeln(b.capacity);
3

The capacity is 3. Intriguing, as a block of 16 bytes should be able to
hold 4 ints.

We can ask the GC for more info on this block:

import core.memory;
auto info = GC.query(a.ptr);
writefln("0x%x, %s, %s ", info.base, info.size, info.attr);
0x2211010, 16, 10

That's the pointer to the start of the block, the size of the block, and
various attributes (appendable, e.g.).
We can get the raw data of the block:

auto block = (cast(ubyte*)info.base)[0..info.size];
writeln(block);
[1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8]

We can see our 1 and 2 in there, and a curious 8 at the end. That's the
currently used data, in bytes. That's also the reason the capacity is 3,
not 4 - this info has to live somewhere. If we were to append another
element, and print the data again:

a ~= 3;
writeln(block);
[1, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 12]

See how the last byte changed to a 12? That just so happens to be
a.length*int.sizeof.

To be more specific, for blocks <= 256 bytes, 1 byte is reserved at the end of the array to store the length. For blocks > 256 bytes and <= 2048 bytes, 2 bytes are reserved at the end of the block to store the length of the array. For larger blocks, those are PAGE size and larger, and they have a special feature. Such blocks are not limited to a power of 2, and can be extended literally in-place by tacking on additional PAGEs. I wanted to just put a size_t at the end, but my problem with this is that the length then would move around as you appended or shrunk blocks. Given how the runtime works, it's possible that 2 threads could be potentially appending at the same time to a shared array, so I decided to store it at the beginning of the block instead.

I would actually like to replace this mechanism with one that stores the length outside the block and into a separate memory space, as it is horrible for caches. Allocate a page of bytes, and you actually get 2 pages - size_t.sizeof.

Note, for types with destructors, more bytes are reserved to store the type info of the array elements. I didn't write that part, so I'm not 100% sure how it works.

Now for a curious thing: what happens to a's capacity when we append to b?

b ~= 4;
writeln(a.capacity);
3

As above, the length of a in bytes equals the used bytes in the
allocated memory block, and so both a and b have capacity again.

Yes, for this reason, calling assumeSafeAppend is unsafe and can *never* be part of the normal treatment of arrays. It is on you, the programmer, to ensure that no references to the no-longer-allocated portion of the array exist. The compiler can't ensure it, a library function can't ensure it, and they are similar to dangling pointers.

Only a library type which encapsulates the array completely can use assumeSafeAppend.

For example, imagine if these are immutable arrays, you have now overwritten immutable data!

-Steve

Reply via email to