In C, C++, and D people have allocated memory for a long time in the following manner:

1. Allocate as many bytes as needed (e.g. by using malloc);
2. Mess with the memory allocated.

(C++ took this one step further by defining class-specific allocators, feature that D copied. That turned out to be little else than a lateral step. C++ also offers distinct allocation for arrays vs. anything else, but because it lacks typed reallocation primitives, it effectively hamstrung that feature's capabilities. Also the STL introduced allocators, which turned out to not do much of anything useful except being an immense time sink for everyone involved with them in any capacity. I call these cumulatively The Big Distraction.)

I've just had a mini-epiphany that's been buzzing around my head for a while. Finally it got to a form expressible in words:

====
Type should inform allocation.
====

I.e. you shouldn't first allocate and then brand the memory with a type. Instead, you should make the type known _during_ the allocation process.

This is not new and on the face of it not even all that noteworthy: (a) some high-level languages did this for years, (b) it's unclear which exact properties of a type should be relevant to the allocation.

Within the context of D, there are some interesting aspects of a type that a complex modular allocator could use to great effect. Consider:

1. Individual objects (i.e. not arrays)

When allocating an individual object, there is virtually zero chance that object will ever need to be resized. (There are exceptions such as "the struct hack" but those would be handled together with arrays.) There are two important consequences of this:

(a) Individual object sizes are already quantized, i.e. there is a relatively small set of distinct sizes compared to the number of objects being allocated.

(b) Individual objects can be placed in an allocator that's arbitrarily adverse to reallocation (i.e. has no in-place expansion ability and no clever reallocation primitives). It seems like FreeTree (file:///Users/aalexandre/code/d/dlang.org/web/phobos-prerelease/std_experimental_allocator_free_tree.html) would be an excellent fit for allocating individual objects. HeapBlock also comes to mind.

2. Arrays

The range of sizes allocated for arrays is quite a bit larger and more arbitrary than for individual objects. Arrays still grow in quanta (e.g. T.sizeof hops) but the set of distinct sizes is much larger and much less predictable.

One other interesting thing is that arrays offer concatenation and resizing as primitive operations, so it follows that their underlying allocator should be friendly to such. This leads to an allocator design for arrays that's radically different from the allocator design for individual objects.

Long story short, arrays should sit on a different heap than objects.

3. Thread-local vs. shared objects

Currently in D it's legal to allocate memory in one thread and deallocate it in another. (One simple way to look at it is casting to shared.) This has a large performance cost that only benefits very few actual cases.

It follows that we need to change the notion that you first allocate memory and then brand it as shared. The "will be shared" knowledge must be present during allocation, and use different methods of allocation for the two cases.

4. "Has pointers" vs. "has no pointers"

When collecting a specific object, it's important to know whether its type embeds pointers. If it does, more scanning is necessary. Otherwise, the object is a "leaf" leading to no others.

It seems to me we stand to gain by physically and conceptually separating memory that stores leaf objects from that dedicated to non-leaf objects. Making the scan-heavy memory more compact leads to better cache friendliness.

Currently D's allocator allows choosing at runtime for each allocated block whether is should be scanned. The new design would require making the choice upfront. There would be, of course, the "anything goes" heap that still allows that level of dynamism.

5. "Plan to deallocate" vs. "plan to let the GC mind this"

Objects that will be deallocated manually deserve different placement compared to those that will only be deallocated during a GC cycle.

6. Immutable vs. mutable

I'm not sure how this can be practically exploited. There must be something in the notion that, save for a brief write period after allocation, the memory will stay unmodified. Still thinking.

7. Strings

On the face of it strings are immutable arrays, so they've been already discussed. However, they have some interesting characteristics:

(a) small alignment, e.g. 1 byte

(b) often extremely short, e.g. 1-8 bytes

(c) concatenation intensive (including append)

(d) perhaps other unique properties and patterns that elude me

So... types should inform allocation.


Please chime in!

Andrei

Reply via email to