A few thoughts on std.allocator

Andrei Alexandrescu via Digitalmars-d Sun, 10 May 2015 02:50:53 -0700

In C, C++, and D people have allocated memory for a long time in thefollowing manner:


1. Allocate as many bytes as needed (e.g. by using malloc);
2. Mess with the memory allocated.

(C++ took this one step further by defining class-specific allocators,feature that D copied. That turned out to be little else than a lateralstep. C++ also offers distinct allocation for arrays vs. anything else,but because it lacks typed reallocation primitives, it effectivelyhamstrung that feature's capabilities. Also the STL introducedallocators, which turned out to not do much of anything useful exceptbeing an immense time sink for everyone involved with them in anycapacity. I call these cumulatively The Big Distraction.)

I've just had a mini-epiphany that's been buzzing around my head for awhile. Finally it got to a form expressible in words:


====
Type should inform allocation.
====

I.e. you shouldn't first allocate and then brand the memory with a type.Instead, you should make the type known _during_ the allocation process.

This is not new and on the face of it not even all that noteworthy: (a)some high-level languages did this for years, (b) it's unclear whichexact properties of a type should be relevant to the allocation.

Within the context of D, there are some interesting aspects of a typethat a complex modular allocator could use to great effect. Consider:


1. Individual objects (i.e. not arrays)

When allocating an individual object, there is virtually zero chancethat object will ever need to be resized. (There are exceptions such as"the struct hack" but those would be handled together with arrays.)There are two important consequences of this:

(a) Individual object sizes are already quantized, i.e. there is arelatively small set of distinct sizes compared to the number of objectsbeing allocated.

(b) Individual objects can be placed in an allocator that's arbitrarilyadverse to reallocation (i.e. has no in-place expansion ability and noclever reallocation primitives). It seems like FreeTree(file:///Users/aalexandre/code/d/dlang.org/web/phobos-prerelease/std_experimental_allocator_free_tree.html)would be an excellent fit for allocating individual objects. HeapBlockalso comes to mind.


2. Arrays

The range of sizes allocated for arrays is quite a bit larger and morearbitrary than for individual objects. Arrays still grow in quanta (e.g.T.sizeof hops) but the set of distinct sizes is much larger and muchless predictable.

One other interesting thing is that arrays offer concatenation andresizing as primitive operations, so it follows that their underlyingallocator should be friendly to such. This leads to an allocator designfor arrays that's radically different from the allocator design forindividual objects.


Long story short, arrays should sit on a different heap than objects.

3. Thread-local vs. shared objects

Currently in D it's legal to allocate memory in one thread anddeallocate it in another. (One simple way to look at it is casting toshared.) This has a large performance cost that only benefits very fewactual cases.

It follows that we need to change the notion that you first allocatememory and then brand it as shared. The "will be shared" knowledge mustbe present during allocation, and use different methods of allocationfor the two cases.


4. "Has pointers" vs. "has no pointers"

When collecting a specific object, it's important to know whether itstype embeds pointers. If it does, more scanning is necessary. Otherwise,the object is a "leaf" leading to no others.

It seems to me we stand to gain by physically and conceptuallyseparating memory that stores leaf objects from that dedicated tonon-leaf objects. Making the scan-heavy memory more compact leads tobetter cache friendliness.

Currently D's allocator allows choosing at runtime for each allocatedblock whether is should be scanned. The new design would require makingthe choice upfront. There would be, of course, the "anything goes" heapthat still allows that level of dynamism.


5. "Plan to deallocate" vs. "plan to let the GC mind this"

Objects that will be deallocated manually deserve different placementcompared to those that will only be deallocated during a GC cycle.


6. Immutable vs. mutable

I'm not sure how this can be practically exploited. There must besomething in the notion that, save for a brief write period afterallocation, the memory will stay unmodified. Still thinking.


7. Strings

On the face of it strings are immutable arrays, so they've been alreadydiscussed. However, they have some interesting characteristics:


(a) small alignment, e.g. 1 byte

(b) often extremely short, e.g. 1-8 bytes

(c) concatenation intensive (including append)

(d) perhaps other unique properties and patterns that elude me

So... types should inform allocation.


Please chime in!

Andrei

A few thoughts on std.allocator

Reply via email to